1 Dataset

Save metadatas about the dataset.

1.0.0.1 Figure S1

Save Marey maps in a Supplementary pdf…

1.0.0.2 Figure S2

Save recombination maps in a Supplementary pdf…

1.1 Dataset quality

Assessment of recombination landscapes quality.

# ============================================================================
# Quantitative assessment - Correlation between the chromosome wide recombination
# rate & the mean estimated recombination rate Show the linear relationship
# between chromosome wide rate and the mean estimated recombination rate as a
# cross-validation procedure
# ============================================================================
plot(chromosomes$chrwide.rate, chromosomes$mean.recrate, xlab = "Chromosome-wide recombination rate", 
    ylab = "Mean estimated recombination rate", cex = 1.2, cex.lab = 1.2, cex.axis = 1.2)
abline(a = 0, b = 1, col = "Red")
Correlation between the chromosome wide recombination rate and the mean estimated recombination rate. Identity line drawn in red.

Figure 1.1: Correlation between the chromosome wide recombination rate and the mean estimated recombination rate. Identity line drawn in red.

cor.test(chromosomes$chrwide.rate, chromosomes$mean.recrate, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  chromosomes$chrwide.rate and chromosomes$mean.recrate
## S = 117170, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9976094

Correlation between recombination maps at 100kb and 1Mb.

# Correlation
cor.test(estimates1Mb$rec.rate, estimates1Mb$average100kb, method = "spearman")
## Warning in cor.test.default(estimates1Mb$rec.rate, estimates1Mb$average100kb, :
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  estimates1Mb$rec.rate and estimates1Mb$average100kb
## S = 9.5475e+10, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9975762
plot(estimates1Mb$rec.rate, estimates1Mb$average100kb)

sp_corr$species = gsub("_", " ", sp_corr$species)
colnames(sp_corr) = c("Species", "Number of 1Mb windows", "Rho", "Statistic", "p-value")
kable(sp_corr, digits = 3, align = "c")
Species Number of 1Mb windows Rho Statistic p-value
Aegilops speltoides 1566 1.000 38660.408 0
Arabidopsis thaliana 121 1.000 80.001 0
Arachis duranensis 805 0.998 204003.923 0
Arachis hypogaea 2018 1.000 315.000 0
Boechera stricta 189 0.999 634.535 0
Brachypodium distachyon 138 1.000 10.000 0
Brassica napus 382 1.000 2404.004 0
Brassica rapa 179 1.000 40.000 0
Camelina sativa 447 0.999 6661.284 0
Camellia sinensis 994 1.000 3069.505 0
Capsella rubella 111 0.999 250.000 0
Capsicum annuum 2612 1.000 1116249.438 0
Cenchrus americanus 1535 0.932 38673978.897 0
Citrullus lanatus 309 0.998 8987.501 0
Citrus sinensis 251 1.000 263.030 0
Coffea canephora 587 0.999 43672.649 0
Cucumis melo 339 1.000 1637.193 0
Cucumis sativus 193 1.000 356.000 0
Cucurbita maxima 201 0.985 14935.074 0
Cucurbita pepo 182 0.972 20934.056 0
Dioscorea alata 343 1.000 1372.518 0
Draba nivalis 242 1.000 146.000 0
Elaeis guineensis 519 1.000 4390.011 0
Eucalyptus grandis 614 1.000 2980.000 0
Glycine max 922 0.999 66360.603 0
Gossypium hirsutum 1941 1.000 276988.789 0
Gossypium raimondii 756 1.000 6256.510 0
Helianthus annuus 2240 0.999 935972.729 0
Hordeum vulgare 4581 1.000 4321294.375 0
Juglans regia 721 1.000 13974.110 0
Lupinus albus 442 0.998 28967.238 0
Lupinus angustifolius 564 0.999 24885.071 0
Malus domestica 656 1.000 1636.004 0
Mangifera indica 363 0.998 12414.386 0
Manihot esculenta 391 0.999 5484.362 0
Momordica charantia 297 0.999 3431.625 0
Nelumbo nucifera 812 1.000 4921.502 0
Oryza nivara 373 1.000 761.005 0
Oryza sativa 377 1.000 1280.500 0
Panicum hallii 488 0.991 173277.055 0
Phaseolus vulgaris 520 0.998 39531.933 0
Prunus mume 180 0.999 962.000 0
Prunus persica 227 1.000 96.500 0
Quercus sp 720 1.000 7249.008 0
Raphanus sativus 232 1.000 553.352 0
Sesamum indicum 259 1.000 973.511 0
Setaria italica 405 1.000 154.004 0
Solanum lycopersicum 806 0.995 374629.627 0
Solanum tuberosum 728 0.999 82532.028 0
Sorghum bicolor 663 1.000 7314.467 0
Theobroma cacao 335 1.000 272.000 0
Triticum aestivum 13346 1.000 39822683.941 0
Triticum dicoccoides 5821 1.000 1611892.897 0
Triticum urartu 4611 1.000 441062.644 0
Vigna unguiculata 478 1.000 191.000 0
Vitis vinifera 467 0.988 182751.865 0
Zea mays 2107 1.000 20666.063 0
# Save the table of correlation per species/dataset
write.xlsx(x = sp_corr, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S4_MareyMaps_CorrelationScales", row.names = FALSE, append = TRUE)

1.1.1 Correction of the linkage map length uncorrected/corrected (map coverage)

Two methods to correct linkage map length 1. Chakravarti et al. (1991) 2. Hall and Willis (2005)

Corrected linkkage map length don’t change significantly the raw linkage map length.

corr_linkagemap = chromosome.stats$linkage.map.length - chromosome.stats$linkage.map.length.correctedHW
summary(corr_linkagemap)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -5.6216 -1.7449 -0.8285 -1.1891 -0.3466  0.0000
n.boot = 1000
boot = numeric(n.boot)
for (i in 1:n.boot) {
    boot[i] = mean(sample(corr_linkagemap, replace = TRUE), na.rm = TRUE)
}
mean(boot)
## [1] -1.184658
quantile(boot, c(0.025, 0.975))
##      2.5%     97.5% 
## -1.271591 -1.096404

1.1.2 Genome coverage

# The difference between genomic map length (Mb) in the Marey map and the total
# chromosome size from the fasta file.
summary(chromosome.stats$phys.map.length * 10^6 - chromosome.stats$chromosome.size.bp)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.490e-08  0.000e+00  0.000e+00  3.081e-11  0.000e+00  7.451e-09

1.1.3 Marker density

Number of markers

summary(chromosome.stats$nb.markers)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    31.0   117.0   224.0   956.6   581.0 49483.0
hist(chromosome.stats$nb.markers, breaks = 300, main = "", xlab = "Number of markers")

hist(chromosome.stats$nb.markers, breaks = 30000, xlim = c(0, 1000), main = "", xlab = "Number of markers")

Marker density (nb of markers/total map length); i.e. the averaged number of markers per cM/Mb, more is better.

summary(chromosome.stats$density.markers.cM)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.3571   1.0172   2.2403   8.3394   4.6830 359.1491
hist(chromosome.stats$density.markers.cM, breaks = 80, main = "", xlab = "Marker density (cM)")

summary(chromosome.stats$density.markers.bp * 10^6)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.1563   2.4660   4.8309  20.3394  14.2718 696.2378
hist(chromosome.stats$density.markers.bp * 10^6, breaks = 80, main = "", xlab = "Marker density (Mb)")

hist(chromosome.stats$density.markers.bp * 10^6, breaks = 800, main = "", xlab = "Marker density (Mb)", 
    xlim = c(0, 100))

Mean interval between markers (cM/Mb)

TODO - median interval (less sensitive to extreme values, i.e. large gaps)

summary(chromosome.stats$marker.interval.cM)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.002785 0.212289 0.447341 0.659125 0.995402 2.810800
hist(chromosome.stats$marker.interval.cM, breaks = 80, main = "", xlab = "Mean interval between adjacent markers (cM)")

summary(chromosome.stats$marker.interval.bp/10^6)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.001436 0.069858 0.207525 0.341234 0.403838 6.522944
hist(chromosome.stats$marker.interval.bp/10^6, breaks = 80, main = "", xlab = "Mean interval between adjacent markers (Mb)")

hist(chromosome.stats$marker.interval.bp/10^6, breaks = 80, main = "", xlab = "Mean interval between adjacent markers (Mb)", 
    xlim = c(0, 2))

1.1.4 How many maps discarded

Number of dataset discarded.

Number of chromosomes discarded among retained dataset.

maps = read.table(file = paste0(wd, "/data-cleaned/marey_maps/AllMaps.txt"), header = TRUE, 
    sep = "\t")

# All chromosomes - import Marey maps
set_list = paste(wd, "/data-cleaned/marey_maps/", unique(maps$set), ".txt", sep = "")
allmaps = do.call("rbind", lapply(set_list, FUN = function(file) {
    read.table(file, header = TRUE, sep = "\t")
}))
# Chromosomes in dataset
length(unique(paste(allmaps$set, allmaps$map, sep = "_")))
## [1] 682
# Chromosomes retained
length(unique(paste(maps$set[which(maps$vld)], maps$map[which(maps$vld)], sep = "_")))
## [1] 665

1.2 Phylogeny

Phylogenetic tree of sampled species.

## 
## Phylogenetic tree with 57 tips and 56 internal nodes.
## 
## Tip labels:
##   Manihot esculenta, Cucumis sativus, Cucumis melo, Citrullus lanatus, Cucurbita pepo, Cucurbita maxima, ...
## Node labels:
##   Root, Pentapetalae, mrcaott2ott8384, rosids, mrcaott2ott371, mrcaott371ott579, ...
## 
## Rooted; includes branch lengths.

1.2.0.1 Figure S3

Phylogenetic tree representing sampled species and annotated with mean recombination rates and mean chromosome sizes.

Figure 1.2: Phylogenetic tree representing sampled species and annotated with mean recombination rates and mean chromosome sizes.

1.3 Life history traits

1.3.1 Mating system

lht = openxlsx::read.xlsx(paste(wd, "data-cleaned/ListPlant_Thomas.xlsx", sep = ""), 
    sheet = 1)
lht$sp = paste(lht$Genus, " ", lht$Species, sep = "")
chr_lht = chromosome.stats
chr_lht$mating = NA
for (i in 1:nrow(chr_lht)) {
    chr_lht$mating[i] = lht$Mating_system[which(lht$sp == chr_lht$species[i])]
}
lht$recrate = NA
lht$linkagemaplength = NA
for (i in 1:nrow(lht)) {
    lht$recrate[i] = mean(chr_lht$mean.recrate[which(chr_lht$species == lht$sp[i])], 
        na.rm = TRUE)
    lht$linkagemaplength[i] = mean(chr_lht$linkage.map.length.correctedHW[which(chr_lht$species == 
        lht$sp[i])], na.rm = TRUE)
}
# Sample sizes
table(lht$Mating_system)
## 
##       Mixed Outcrossing     Selfing 
##          12          21          20
table(chr_lht$mating)
## 
##       Mixed Outcrossing     Selfing 
##         163         200         211
boxplot(recrate ~ Mating_system, data = lht, xlab = "Mean recombination rate", ylab = "Linkage map length (cM)")

kruskal.test(recrate ~ Mating_system, data = lht)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  recrate by Mating_system
## Kruskal-Wallis chi-squared = 1.7842, df = 2, p-value = 0.4098
boxplot(linkagemaplength ~ Mating_system, data = lht, xlab = "Mating system", ylab = "Linkage map length (cM)")

kruskal.test(linkagemaplength ~ Mating_system, data = lht)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  linkagemaplength by Mating_system
## Kruskal-Wallis chi-squared = 0.90034, df = 2, p-value = 0.6375
boxplot(mean.recrate ~ mating, data = chr_lht, xlab = "Mean recombination rate", 
    ylab = "Linkage map length (cM)")

boxplot(linkage.map.length ~ mating, data = chr_lht, xlab = "Mating system", ylab = "Linkage map length (cM)")

1.4 Position of the centromeric index on the genome

False Positive Rate. How many centromeric indexes have been assigned to the wrong side of the chromosome? Compare the recombination rate at the centromeric index with the putative position on the opposite direction, thus the FPR is the percentage of CI that have a higher recombination rate than the opposite position.

sum(!is.na(df$true_CI_rate))
## [1] 424
# Number of centromeric indexes wrongly oriented (i.e. lower recombination rate
# on the opposite position)
sum(df$true_CI_rate > df$false_CI_rate, na.rm = TRUE)
## [1] 70
# Proportion -> the FPR
sum(df$true_CI_rate > df$false_CI_rate, na.rm = TRUE)/sum(!is.na(df$true_CI_rate))
## [1] 0.1650943
# Identify these chromosomes to put them asides
idx_wrongCIorientation = which(df$true_CI_rate > df$false_CI_rate)
paste(df$set[which(df$true_CI_rate > df$false_CI_rate)], df$chromosome[which(df$true_CI_rate > 
    df$false_CI_rate)])
##  [1] "Boechera_stricta_Lee2020 2"                
##  [2] "Boechera_stricta_Lee2020 5"                
##  [3] "Brassica_napus_Yang2017 A07"               
##  [4] "Brassica_rapa_CodyMarkelz2017 A01"         
##  [5] "Capsella_rubella_Slotte2013 1"             
##  [6] "Capsella_rubella_Slotte2013 4"             
##  [7] "Capsicum_annuum_Han2016 3"                 
##  [8] "Capsicum_annuum_Han2016 12"                
##  [9] "Cenchrus_americanus_Pucher2017 2"          
## [10] "Cenchrus_americanus_Pucher2017 6"          
## [11] "Citrullus_lanatus_Ren2015 3"               
## [12] "Citrullus_lanatus_Ren2015 7"               
## [13] "Citrullus_lanatus_Ren2015 10"              
## [14] "Citrus_sinensis_Huang2018 4"               
## [15] "Coffea_canephora_Crouzillat2020 A"         
## [16] "Coffea_canephora_Crouzillat2020 K"         
## [17] "Cucumis_melo_Pereira2018 4"                
## [18] "Cucumis_melo_Pereira2018 7"                
## [19] "Cucumis_melo_Pereira2018 9"                
## [20] "Cucumis_sativus_Zhu2016 6"                 
## [21] "Cucumis_sativus_Zhu2016 5"                 
## [22] "Elaeis_guineensis_Yaakub2020 5"            
## [23] "Elaeis_guineensis_Yaakub2020 1"            
## [24] "Elaeis_guineensis_Yaakub2020 12"           
## [25] "Eucalyptus_grandis_Bertholome2015 1"       
## [26] "Eucalyptus_grandis_Bertholome2015 6"       
## [27] "Eucalyptus_grandis_Bertholome2015 7"       
## [28] "Eucalyptus_grandis_Bertholome2015 8"       
## [29] "Gossypium_hirsutum_Zhang2019 A11"          
## [30] "Lupinus_angustifolius_Zhou2017 8"          
## [31] "Lupinus_angustifolius_Zhou2017 10"         
## [32] "Lupinus_angustifolius_Zhou2017 19"         
## [33] "Lupinus_angustifolius_Zhou2017 15"         
## [34] "Lupinus_angustifolius_Zhou2017 3"          
## [35] "Lupinus_angustifolius_Zhou2017 7"          
## [36] "Malus_domestica_DiPierro2016 3"            
## [37] "Manihot_esculenta_ICGMC2015 4"             
## [38] "Manihot_esculenta_ICGMC2015 6"             
## [39] "Manihot_esculenta_ICGMC2015 8"             
## [40] "Manihot_esculenta_ICGMC2015 12"            
## [41] "Momordica_charantia_Mastumura2020 2"       
## [42] "Nelumbo_nucifera_Gui2018 1"                
## [43] "Nelumbo_nucifera_Gui2018 5"                
## [44] "Oryza_nivara_Ma2016 9"                     
## [45] "Oryza_nivara_Ma2016 12"                    
## [46] "Oryza_sativa_DeLeon2016 3"                 
## [47] "Oryza_sativa_DeLeon2016 5"                 
## [48] "Oryza_sativa_DeLeon2016 6"                 
## [49] "Phaseolus_vulgaris_Song2015 4"             
## [50] "Phaseolus_vulgaris_Song2015 10"            
## [51] "Phaseolus_vulgaris_Song2015 11"            
## [52] "Prunus_persica_Verde2017 3"                
## [53] "Sesamum_indicum_Wang2016 1"                
## [54] "Sesamum_indicum_Wang2016 3"                
## [55] "Sesamum_indicum_Wang2016 4"                
## [56] "Solanum_lycopersicum_Gonda2018 12"         
## [57] "Solanum_tuberosum_Endelman2016 1"          
## [58] "Solanum_tuberosum_Endelman2016 4"          
## [59] "Sorghum_bicolor_Zou2012 6"                 
## [60] "Sorghum_bicolor_Zou2012 7"                 
## [61] "Sorghum_bicolor_Zou2012 10"                
## [62] "Theobroma_cacao_Royaert2016 2"             
## [63] "Theobroma_cacao_Royaert2016 6"             
## [64] "Theobroma_cacao_Royaert2016 7"             
## [65] "Triticum_aestivum_GutierrezGonzalez2019 4B"
## [66] "Triticum_aestivum_GutierrezGonzalez2019 4D"
## [67] "Vitis_vinifera_Brault2020 4"               
## [68] "Vitis_vinifera_Brault2020 6"               
## [69] "Vitis_vinifera_Brault2020 9"               
## [70] "Vitis_vinifera_Brault2020 12"
# The distribution of differences Values below 0 indicates that recombination
# rates are lower on the opposite position
hist(df$false_CI_rate - df$true_CI_rate, main = "", xlab = "Differences inferred/alternative centromere position", 
    breaks = 30)

# How many of wrongly oriented CI have a small difference between true and false
# position? e.g. less than 0.5 cM/Mb
threshold = 1
sum(df$true_CI_rate > df$false_CI_rate & (df$false_CI_rate - df$true_CI_rate) > -threshold, 
    na.rm = TRUE)
## [1] 45
sum(df$true_CI_rate > df$false_CI_rate & (df$false_CI_rate - df$true_CI_rate) > -threshold, 
    na.rm = TRUE)/sum(df$true_CI_rate > df$false_CI_rate, na.rm = TRUE)
## [1] 0.6428571

2 Smaller chromosomes recombine more than larger ones

Test the correlation between chromosome size and recombination rate.

# Sample size
nrow(chromosome.stats)
## [1] 665

How many chromosomes with less than one CO?

length(which(chromosome.stats$linkage.map.length < 50))
## [1] 11
# proportion
length(which(chromosome.stats$linkage.map.length < 50))/nrow(chromosome.stats)
## [1] 0.01654135

How often do chromosome arms have multiple crossovers, versus a single one?

length(which(chromosome.stats$linkage.map.length > 50 & chromosome.stats$linkage.map.length < 
    100))
## [1] 234
length(which(chromosome.stats$linkage.map.length > 100))
## [1] 419
# proportion
length(which(chromosome.stats$linkage.map.length < 50))/nrow(chromosome.stats)
## [1] 0.01654135
# Spearman correlation
cor.test(chromosome.stats$phys.map.length, chromosome.stats$mean.recrate, method = "spearman")
## Warning in cor.test.default(chromosome.stats$phys.map.length,
## chromosome.stats$mean.recrate, : Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  chromosome.stats$phys.map.length and chromosome.stats$mean.recrate
## S = 90278103, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.8419156
# Spearman correlation with Phylogenetically Independent Contrasts
# cor_phylo(log10(mean.recrate) ~ log10(phys.map.length), data =
# phylogenetic_traits, species = phylogenetic_traits$Species, phy = tree)

# Test the correlation with PGLMM
pglmm_mod = pglmm(log10(mean.recrate) ~ log10(phys.map.length) + (1 | Species__), 
    data = phylogenetic_traits, family = "gaussian", cov_ranef = list(Species = tree))

summary(pglmm_mod)
## Linear mixed model fit by restricted maximum likelihood
## 
## Call:log10(mean.recrate) ~ log10(phys.map.length)
## <environment: 0x7f83c9ac96c8>
## 
##  logLik     AIC     BIC 
##   654.9 -1299.8 -1283.0 
## 
## Random effects:
##             Variance Std.Dev
## 1|Species   0.038892  0.1972
## 1|Species__ 0.022891  0.1513
## residual    0.005329  0.0730
## 
## Fixed effects:
##                            Value Std.Error  Zscore    Pvalue    
## (Intercept)             0.867630  0.111049   7.813 5.584e-15 ***
## log10(phys.map.length) -0.313184  0.027989 -11.190 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.0.1 Model selection to identify the model that best fit the data.

Table 2.1: Model selection based on AIC/BIC
model formula AIC BIC
LM log10(mean.recrate) ~ log10(phys.map.length) -388.852 -375.3527
LMER log10(mean.recrate) ~ log10(phys.map.length) + (1 | Species) -1275.590 -1257.5906
LMER log10(mean.recrate) ~ log10(phys.map.length) + (log10(phys.map.length) | Species) -1347.495 -1320.4960
PGLMM log10(mean.recrate) ~ log10(phys.map.length) + (1|Species__) -1286.970 -1273.5493
write.xlsx(x = model_selection, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S5_ModelSelection", row.names = FALSE, append = TRUE)

2.0.2 Representing the relationship between chromosome size and recombination rate.

## 
## Call:
## lm(formula = log10(chromosome.stats$mean.recrate) ~ log10(phys.map.length), 
##     data = chromosome.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68324 -0.14576  0.01838  0.13625  0.48238 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.90461    0.02717   70.09   <2e-16 ***
## log10(phys.map.length) -0.90668    0.01569  -57.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1801 on 663 degrees of freedom
## Multiple R-squared:  0.8344, Adjusted R-squared:  0.8342 
## F-statistic:  3341 on 1 and 663 DF,  p-value: < 2.2e-16
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log10(mean.recrate) ~ log10(phys.map.length) + (1 | species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: -1283.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5275 -0.5818 -0.0144  0.5723  3.2839 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 0.098403 0.3137  
##  Residual             0.005344 0.0731  
## Number of obs: 665, groups:  species, 57
## 
## Fixed effects:
##                         Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)              0.89480    0.06329 193.45259   14.14   <2e-16 ***
## log10(phys.map.length)  -0.32494    0.02770 662.81832  -11.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg10(phy..) -0.753
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
##            R2m       R2c
## [1,] 0.1680834 0.9571483

2.0.2.1 Figure 1

(ref:meanrecrate) Negative correlation of recombination rates (cM/Mb, log scale) with chromosome size (Mb, log scale). Recombination rates were estimated with loess regression in windows of 100kb. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). (A) The black solid line represents the linear model regression line fitted to the data. The expected simulated regression line is in red, assuming on average two crossovers per chromosome for the solid line (100cM), within an interval of 1 to 3 crossovers per chromosome (respectively lower and upper dashed red lines). (B) Within species relationships between recombination rates and chromosome size. The black dashed line represents the selected Linear Mixed Model (LMER). Colored lines for species random regressions.

Figure 2.1: (ref:meanrecrate) Negative correlation of recombination rates (cM/Mb, log scale) with chromosome size (Mb, log scale). Recombination rates were estimated with loess regression in windows of 100kb. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). (A) The black solid line represents the linear model regression line fitted to the data. The expected simulated regression line is in red, assuming on average two crossovers per chromosome for the solid line (100cM), within an interval of 1 to 3 crossovers per chromosome (respectively lower and upper dashed red lines). (B) Within species relationships between recombination rates and chromosome size. The black dashed line represents the selected Linear Mixed Model (LMER). Colored lines for species random regressions.

2.0.3 Relationship between linkage map length and chromosome length. Linkage map length is a measure of the absolute number of crossover per chromosome.

cor.test(chromosomes$phys.map.length, chromosomes$linkage.map.length.correctedHW, 
    method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  chromosomes$phys.map.length and chromosomes$linkage.map.length.correctedHW
## S = 40018775, p-value = 1.899e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1835096
## 
## Call:
## lm(formula = linkage.map.length.correctedHW ~ phys.map.length, 
##     data = chromosome.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -88.612 -37.943  -7.066  28.798 236.029 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     116.67034    2.07605   56.20  < 2e-16 ***
## phys.map.length   0.05707    0.01121    5.09 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.04 on 663 degrees of freedom
## Multiple R-squared:  0.03761,    Adjusted R-squared:  0.03616 
## F-statistic: 25.91 on 1 and 663 DF,  p-value: 4.665e-07
## [1] 0.03761008
## attr(,"adj.r.squared")
## [1] 0.03761108
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: linkage.map.length.correctedHW ~ phys.map.length + (phys.map.length |  
##     species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 6226.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3021 -0.5737 -0.0653  0.4890  4.9865 
## 
## Random effects:
##  Groups   Name            Variance Std.Dev. Corr 
##  species  (Intercept)     1667.311 40.833        
##           phys.map.length    3.557  1.886   -0.39
##  Residual                  389.668 19.740        
## Number of obs: 665, groups:  species, 57
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)      41.8231     6.8216 56.3289   6.131 9.12e-08 ***
## phys.map.length   2.0955     0.2736 44.4002   7.659 1.18e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## phys.mp.lng -0.489
##            R2m       R2c
## [1,] 0.4850092 0.9983054
## 
## Call:
## lm(formula = excessCO ~ chrrelativesize, data = chromosome.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.996 -34.787  -5.097  28.758 210.634 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -4.119      7.715  -0.534    0.594    
## chrrelativesize   76.180      7.526  10.122   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 43.68 on 663 degrees of freedom
## Multiple R-squared:  0.1338, Adjusted R-squared:  0.1325 
## F-statistic: 102.5 on 1 and 663 DF,  p-value: < 2.2e-16
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: excessCO ~ chrrelativesize + (chrrelativesize | species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 6117.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3118 -0.5617 -0.0818  0.4922  5.0704 
## 
## Random effects:
##  Groups   Name            Variance Std.Dev. Corr 
##  species  (Intercept)     1663.9   40.79         
##           chrrelativesize 2708.3   52.04    -0.61
##  Residual                  384.9   19.62         
## Number of obs: 665, groups:  species, 57
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)      -13.260      6.925  55.336  -1.915   0.0607 .  
## chrrelativesize   86.551      8.104  49.811  10.680 1.76e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## chrrelatvsz -0.726
##            R2m       R2c
## [1,] 0.1411685 0.8570205

2.0.3.1 Figure 2

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: excessCO ~ chrrelativesize + (chrrelativesize | species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 6117.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3118 -0.5617 -0.0818  0.4922  5.0704 
## 
## Random effects:
##  Groups   Name            Variance Std.Dev. Corr 
##  species  (Intercept)     1663.9   40.79         
##           chrrelativesize 2708.3   52.04    -0.61
##  Residual                  384.9   19.62         
## Number of obs: 665, groups:  species, 57
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)      -13.260      6.925  55.336  -1.915   0.0607 .  
## chrrelativesize   86.551      8.104  49.811  10.680 1.76e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## chrrelatvsz -0.726
(ref:linkagemaplength-chrsize) Relationship between linkage map length and chromosome length. (A) Correlation between chromosome length (Mb) and linkage map length (cM). Each point represents a chromosome (n=665). For clarity, we substracted 50cM to the linkage map length to remove the mandatory crossover and represent only the excess of crossovers. Species are presented in different colors (57 species). The linear regression is the solid black line. The fixed regression of the Linear Mixed Model is the dashed black line. Species random slopes are in colors. Isolines of the Genome-wide Recombination Rate (GwRR) were plotted in dotted red lines to represent regions of equal recombination rates. (B) Species random intercepts as a function of the specific mean chromosome size (Mb). (C) Species random slopes as a function of the specific mean chromosome size (Mb). (D) Within species effect of relative chromosome size on linkage map length. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

Figure 2.2: (ref:linkagemaplength-chrsize) Relationship between linkage map length and chromosome length. (A) Correlation between chromosome length (Mb) and linkage map length (cM). Each point represents a chromosome (n=665). For clarity, we substracted 50cM to the linkage map length to remove the mandatory crossover and represent only the excess of crossovers. Species are presented in different colors (57 species). The linear regression is the solid black line. The fixed regression of the Linear Mixed Model is the dashed black line. Species random slopes are in colors. Isolines of the Genome-wide Recombination Rate (GwRR) were plotted in dotted red lines to represent regions of equal recombination rates. (B) Species random intercepts as a function of the specific mean chromosome size (Mb). (C) Species random slopes as a function of the specific mean chromosome size (Mb). (D) Within species effect of relative chromosome size on linkage map length. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

(ref:linkagemaplength-chrsize) Relationship between linkage map length and chromosome length. (A) Correlation between chromosome length (Mb) and linkage map length (cM). Each point represents a chromosome (n=665). For clarity, we substracted 50cM to the linkage map length to remove the mandatory crossover and represent only the excess of crossovers. Species are presented in different colors (57 species). The linear regression is the solid black line. The fixed regression of the Linear Mixed Model is the dashed black line. Species random slopes are in colors. Isolines of the Genome-wide Recombination Rate (GwRR) were plotted in dotted red lines to represent regions of equal recombination rates. (B) Species random intercepts as a function of the specific mean chromosome size (Mb). (C) Species random slopes as a function of the specific mean chromosome size (Mb). (D) Within species effect of relative chromosome size on linkage map length. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

Figure 2.3: (ref:linkagemaplength-chrsize) Relationship between linkage map length and chromosome length. (A) Correlation between chromosome length (Mb) and linkage map length (cM). Each point represents a chromosome (n=665). For clarity, we substracted 50cM to the linkage map length to remove the mandatory crossover and represent only the excess of crossovers. Species are presented in different colors (57 species). The linear regression is the solid black line. The fixed regression of the Linear Mixed Model is the dashed black line. Species random slopes are in colors. Isolines of the Genome-wide Recombination Rate (GwRR) were plotted in dotted red lines to represent regions of equal recombination rates. (B) Species random intercepts as a function of the specific mean chromosome size (Mb). (C) Species random slopes as a function of the specific mean chromosome size (Mb). (D) Within species effect of relative chromosome size on linkage map length. Each point represents a chromosome (n=665). Species are presented in different colors (57 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

2.0.3.2 Figure S4

Yet, it is difficult to imagine a biological mechanism that is able to consider relative sizes instead of absolute sizes. Given that number of genes are roughly constant across species and assuming that genes are distributed proportionnally to relative chromosome sizes within species, we can imagine that number of genes of a chromosome could be a good proxy of the relative size of the chromosome.

Yet, number of genes is not constant across species in our dataset, as expected if annotations are not standard and if some overlapping genes/alternative variants increases the number of gene copies. Besides, we have unequal sampling across species and for some species, we have missing chromosomes.

Hence we decided to divide gene count by the mean gene count of the species (i.e. a relative gene count) as a good proxy of relative chromosome size and compared with standardizing by the total gene count (despite uneven sampling.

Finally we kept the standardization by the total gene count.

Table 2.2: Gene count per species
Species Number of chromosomes Mean gene count Total number of genes
Aegilops speltoides 2 6294 12587
Arabidopsis thaliana 5 5489 27445
Arachis duranensis 8 4228 33823
Arachis hypogaea 18 3592 64656
Brachypodium distachyon 3 5585 16756
Brassica napus 12 5398 64778
Brassica rapa 7 4845 33915
Camelina sativa 15 4454 66816
Camellia sinensis 5 2261 11306
Capsella rubella 7 3351 23457
Cucumis melo 11 2316 25478
Cucumis sativus 7 3397 23780
Cucurbita maxima 19 1532 29116
Cucurbita pepo 17 1621 27554
Dioscorea alata 14 976 13657
Elaeis guineensis 11 1610 17706
Eucalyptus grandis 11 3102 34121
Glycine max 19 2715 51577
Gossypium hirsutum 26 2650 68900
Gossypium raimondii 13 2914 37885
Helianthus annuus 13 2973 38648
Hordeum vulgare 7 5368 37577
Lupinus albus 25 1614 40359
Lupinus angustifolius 20 1475 29495
Malus domestica 17 2485 42250
Manihot esculenta 18 1772 31901
Oryza nivara 12 3026 36313
Oryza sativa 12 3154 37849
Panicum hallii 9 3742 33680
Phaseolus vulgaris 11 2545 27996
Prunus mume 7 2784 19487
Prunus persica 8 3351 26811
Sesamum indicum 13 1707 22193
Setaria italica 9 3807 34264
Solanum lycopersicum 12 2827 33925
Solanum tuberosum 12 3124 37482
Sorghum bicolor 10 3403 34027
Theobroma cacao 10 2899 28993
Triticum aestivum 20 4989 99771
Vigna unguiculata 11 2863 31496
Vitis vinifera 19 1399 26588
Zea mays 10 3900 39005
## 
## Call:
## lm(formula = excessCO ~ genecount_std, data = chromosome.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.982 -39.667  -0.738  28.957 152.449 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     55.453      3.679  15.072  < 2e-16 ***
## genecount_std  217.282     38.156   5.695 2.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.55 on 513 degrees of freedom
##   (150 observations deleted due to missingness)
## Multiple R-squared:  0.05945,    Adjusted R-squared:  0.05762 
## F-statistic: 32.43 on 1 and 513 DF,  p-value: 2.085e-08
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: excessCO ~ genecount_std + (1 | species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 4682.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6265 -0.5573 -0.0369  0.4918  4.6119 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 3917.9   62.59   
##  Residual              364.4   19.09   
## Number of obs: 515, groups:  species, 42
## 
## Fixed effects:
##               Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)     -9.744     10.428  46.530  -0.934    0.355    
## genecount_std  789.149     35.573 512.872  22.184   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## genecnt_std -0.366
##            R2m      R2c
## [1,] 0.2783178 0.938588
(ref:linkagemaplength-genecount) Relationship between linkage map length and relative number of genes. Each point represents a chromosome (n=515). Species are presented in different colors (42 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

Figure 2.4: (ref:linkagemaplength-genecount) Relationship between linkage map length and relative number of genes. Each point represents a chromosome (n=515). Species are presented in different colors (42 species). Black solid line is the linear regression across species while dashed black line is the linear mixed regression with a species random effect. Colored solid lines represent the Linear Mixed Model regression line (LMER) fitted to the data, with random regressions fitted to species.

2.1 A model explaining variation in linkage map length by chromosome relative size

Ultimately, we can think of a model:

\[ d = 50 + a*n_i \] \[ d = 50 + a*n*\frac{L_i}{L} \] Where \(n\) is the number of genes of the species, \(n_i\) the number of genes in chromosome \(i\), \(L\) the total physical length of the genome and \(L_i\) the physical length of chromosome \(i\), and, finally, \(a\) is a coefficient supposedly constant across species.

3 Intra-chromosomal heterogeneity in recombination

We observed contrasted patterns between species.

(ref:heterogeneity-recombination-maps) The diversity of recombination landscapes in angiosperms demonstrated in six different species. Recombination landscapes are similar within species (pooled chromosomes). Physical distances were scaled for comparison of chromosomes with different sizes. Estimates of recombination rates were obtained by 1,000 bootstraps of loci in windows of 100kb with loess regression and automatic span calibration. The mean species landscape (dashed line) was estimated by computing the mean recombination rate in 100 bins along the chromosome axis, all chromosomes pooled. Similarly, the lower and upper boundaries (pale ribbon) were estimated by taking the minimum and maximum recombination rates in 100 bins. One chromosome is represented in a solid line for each species, with the physical position of the centromere pointed by a dot. Species ordered by ascending mean chromosome size. (A) Capsella rubella chromosome 1 and n = 7 chromosomes pooled (B) Arabidopsis thaliana chromosome 1 and n = 5 chromosomes pooled (C) Malus domestica chromosome 2 and n = 17 chromosomes pooled (D) Eucalyptus grandis chromosome 10 and n = 11 chromosomes pooled (E) Nelumbo nucifera chromosome 3 and n = 8 chromosomes pooled (F) Zea mays chromosome 10 and n = 10 chromosomes pooled.

Figure 3.1: (ref:heterogeneity-recombination-maps) The diversity of recombination landscapes in angiosperms demonstrated in six different species. Recombination landscapes are similar within species (pooled chromosomes). Physical distances were scaled for comparison of chromosomes with different sizes. Estimates of recombination rates were obtained by 1,000 bootstraps of loci in windows of 100kb with loess regression and automatic span calibration. The mean species landscape (dashed line) was estimated by computing the mean recombination rate in 100 bins along the chromosome axis, all chromosomes pooled. Similarly, the lower and upper boundaries (pale ribbon) were estimated by taking the minimum and maximum recombination rates in 100 bins. One chromosome is represented in a solid line for each species, with the physical position of the centromere pointed by a dot. Species ordered by ascending mean chromosome size. (A) Capsella rubella chromosome 1 and n = 7 chromosomes pooled (B) Arabidopsis thaliana chromosome 1 and n = 5 chromosomes pooled (C) Malus domestica chromosome 2 and n = 17 chromosomes pooled (D) Eucalyptus grandis chromosome 10 and n = 11 chromosomes pooled (E) Nelumbo nucifera chromosome 3 and n = 8 chromosomes pooled (F) Zea mays chromosome 10 and n = 10 chromosomes pooled.

3.0.0.1 Figure 3

3.1 Broken stick model

We used the broken stick model (k = 10) to visually describe the diversity of patterns.

3.1.0.1 Figure 4

(ref:broken-stick) Structure of the recombination within chromosomes. relative recombination rates along the chromosome estimated in ten bins under the broken stick model, with species ordered by ascending variance of relative recombination rates (n=665). The relative recombination rate is the log-transformed ratio of the expected relative genetic length (one tenth) divided by observed relative genetic length. It means that values below zero are recombination rates lower than expected while values above zero are recombination rates higher than expected. When information is available, the centromere position on the chromosome is mapped as a red dot and chromosomes are oriented with the longer arm on top.

Figure 3.2: (ref:broken-stick) Structure of the recombination within chromosomes. relative recombination rates along the chromosome estimated in ten bins under the broken stick model, with species ordered by ascending variance of relative recombination rates (n=665). The relative recombination rate is the log-transformed ratio of the expected relative genetic length (one tenth) divided by observed relative genetic length. It means that values below zero are recombination rates lower than expected while values above zero are recombination rates higher than expected. When information is available, the centromere position on the chromosome is mapped as a red dot and chromosomes are oriented with the longer arm on top.

(ref:broken-stick) Structure of the recombination within chromosomes. relative recombination rates along the chromosome estimated in ten bins under the broken stick model, with species ordered by ascending variance of relative recombination rates (n=665). The relative recombination rate is the log-transformed ratio of the expected relative genetic length (one tenth) divided by observed relative genetic length. It means that values below zero are recombination rates lower than expected while values above zero are recombination rates higher than expected. When information is available, the centromere position on the chromosome is mapped as a red dot and chromosomes are oriented with the longer arm on top.

Figure 3.3: (ref:broken-stick) Structure of the recombination within chromosomes. relative recombination rates along the chromosome estimated in ten bins under the broken stick model, with species ordered by ascending variance of relative recombination rates (n=665). The relative recombination rate is the log-transformed ratio of the expected relative genetic length (one tenth) divided by observed relative genetic length. It means that values below zero are recombination rates lower than expected while values above zero are recombination rates higher than expected. When information is available, the centromere position on the chromosome is mapped as a red dot and chromosomes are oriented with the longer arm on top.

3.2 Heterogeneity measures

3.2.1 Comparative approach - choice of the statistic

Three statistics for heterogeneity: * coefficient of variation * Gini coefficient * variance in broken stick proportions

Model comparison approach:

lm_model = lm(cv.recrate ~ phys.map.length, data = chromosome.stats)
AIC(lm_model)
## [1] 754.5539
BIC(lm_model)
## [1] 768.0533
r.squaredLR(lm_model)
## [1] 0.1728832
## attr(,"adj.r.squared")
## [1] 0.2362929
# par(mfrow = c(2,2)) plot(lm_model, which = c(1,2,3,5)) par(mfrow = c(1,1))

lm_model = lm(gini ~ phys.map.length, data = chromosome.stats)
AIC(lm_model)
## [1] -637.5537
BIC(lm_model)
## [1] -624.0544
r.squaredLR(lm_model)
## [1] 0.1801051
## attr(,"adj.r.squared")
## [1] -0.1555344
# par(mfrow = c(2,2)) plot(lm_model, which = c(1,2,3,5)) par(mfrow = c(1,1))

lm_model = lm(brokenstick_pvariance_scaled ~ phys.map.length, data = chromosome.stats)
AIC(lm_model)
## [1] 1758.335
BIC(lm_model)
## [1] 1771.775
r.squaredLR(lm_model)
## [1] 0.1381831
## attr(,"adj.r.squared")
## [1] 0.1467909
# par(mfrow = c(2,2)) plot(lm_model, which = c(1,2,3,5)) par(mfrow = c(1,1))
lm_model = lm(cv.recrate ~ phys.map.length, data = species.stats)
AIC(lm_model)
## [1] 57.17865
BIC(lm_model)
## [1] 63.3078
r.squaredLR(lm_model)
## [1] 0.2443293
## attr(,"adj.r.squared")
## [1] 0.3530197
lm_model = lm(pvariance ~ phys.map.length, data = species.stats)
AIC(lm_model)
## [1] 142.3715
BIC(lm_model)
## [1] 148.5007
r.squaredLR(lm_model)
## [1] 0.2263934
## attr(,"adj.r.squared")
## [1] 0.2436196

3.2.2 Larger chromosomes have higher heterogeneity?

To test the hypothesis that larger chromosomes have a higher heterogeneity, we used the variance in broken stick segment sizes as a proxy of recombination heterogeneity. But we need to correct for species random effect.

3.2.3 Broken stick variance

# Distribution of variables
hist(chromosome.stats$brokenstick_pvariance_scaled, breaks = 30, main = "", xlab = "Scaled broken stick variance")

Across species: Chromosomes pooled per species. Is there a common pattern across species?

nrow(species.stats)
## [1] 57
# Model selection Linear regression
lm_model = lm(pvariance ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = pvariance ~ phys.map.length, data = species.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91315 -0.53797 -0.01841  0.60257  1.73917 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.3064077  0.1279780  -2.394 0.020092 *  
## phys.map.length  0.0025801  0.0006431   4.012 0.000183 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8148 on 55 degrees of freedom
## Multiple R-squared:  0.2264, Adjusted R-squared:  0.2123 
## F-statistic:  16.1 on 1 and 55 DF,  p-value: 0.0001835
AIC(lm_model)
## [1] 142.3715
BIC(lm_model)
## [1] 148.5007
# Quadratic regression
lm_model = lm(pvariance ~ phys.map.length + I(phys.map.length^2), data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = pvariance ~ phys.map.length + I(phys.map.length^2), 
##     data = species.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05672 -0.42745  0.00285  0.56475  1.65223 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          -5.621e-01  1.746e-01  -3.219  0.00218 **
## phys.map.length       8.273e-03  2.803e-03   2.952  0.00466 **
## I(phys.map.length^2) -8.921e-06  4.281e-06  -2.084  0.04193 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7911 on 54 degrees of freedom
## Multiple R-squared:  0.284,  Adjusted R-squared:  0.2574 
## F-statistic: 10.71 on 2 and 54 DF,  p-value: 0.0001211
AIC(lm_model)
## [1] 139.9634
BIC(lm_model)
## [1] 148.1356
The broken stick variance depends on the mean chromosome size across species (n = 38). The broken stick variance was log-transformed and scaled for comparison between species.

Figure 3.4: The broken stick variance depends on the mean chromosome size across species (n = 38). The broken stick variance was log-transformed and scaled for comparison between species.

Within species: Chromosome-level correlation.

## 
## Call:
## lm(formula = brokenstick_pvariance_scaled ~ log10(phys.map.length), 
##     data = chromosome.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.08746 -0.53587  0.06989  0.62189  1.81720 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -1.79180    0.13431  -13.34   <2e-16 ***
## log10(phys.map.length)  1.06713    0.07731   13.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8801 on 650 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.2267, Adjusted R-squared:  0.2255 
## F-statistic: 190.5 on 1 and 650 DF,  p-value: < 2.2e-16
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## brokenstick_pvariance_scaled ~ log10(phys.map.length) + (log10(phys.map.length) |  
##     species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 1259
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7931 -0.5218  0.0826  0.5785  2.9358 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr 
##  species  (Intercept)            4.4676   2.1137        
##           log10(phys.map.length) 1.8818   1.3718   -0.93
##  Residual                        0.2859   0.5347        
## Number of obs: 652, groups:  species, 57
## 
## Fixed effects:
##                         Estimate Std. Error        df t value Pr(>|t|)
## (Intercept)            -0.004562   0.426866 38.830563  -0.011    0.992
## log10(phys.map.length) -0.118446   0.266835 37.912192  -0.444    0.660
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg10(phy..) -0.961
##              R2m       R2c
## [1,] 0.002007696 0.7944873

Indeed, the adjusted R-squared of the linear model is low and the mixed model explains a low proportion of inter-species effect (R2m, marginal \(R_2\)), with a non-significant coefficient.

(ref:relativechromosomesize-linkagemaplength)

Figure 3.5: (ref:relativechromosomesize-linkagemaplength)

3.2.4 Coefficient of variation

# Distribution of variables
hist(chromosome.stats$cv.recrate, breaks = 30, main = "", xlab = "Coefficient of variation")

Across species: Chromosomes pooled per species.

nrow(species.stats)
## [1] 57
# Model selection Linear regression
lm_model = lm(cv.recrate ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = cv.recrate ~ phys.map.length, data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6389 -0.2502 -0.1124  0.2086  1.6000 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.7934943  0.0606156  13.091  < 2e-16 ***
## phys.map.length 0.0012845  0.0003046   4.217 9.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3859 on 55 degrees of freedom
## Multiple R-squared:  0.2443, Adjusted R-squared:  0.2306 
## F-statistic: 17.78 on 1 and 55 DF,  p-value: 9.308e-05
AIC(lm_model)
## [1] 57.17865
BIC(lm_model)
## [1] 63.3078
# Quadratic regression
lm_model = lm(cv.recrate ~ phys.map.length + I(phys.map.length^2), data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = cv.recrate ~ phys.map.length + I(phys.map.length^2), 
##     data = species.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06646 -0.20389 -0.04645  0.23105  1.17056 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           6.013e-01  7.749e-02   7.759 2.41e-10 ***
## phys.map.length       5.565e-03  1.244e-03   4.475 3.99e-05 ***
## I(phys.map.length^2) -6.707e-06  1.900e-06  -3.531 0.000857 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3511 on 54 degrees of freedom
## Multiple R-squared:  0.386,  Adjusted R-squared:  0.3633 
## F-statistic: 16.98 on 2 and 54 DF,  p-value: 1.904e-06
AIC(lm_model)
## [1] 47.34021
BIC(lm_model)
## [1] 55.51242
## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties
## [1] -0.1833778
##       2.5% 
## -0.2927582
##      97.5% 
## -0.0716899

Selected model for simplicity

# The mean species correlation estimated by 1,000 bootstraps
mean(boot)
## [1] -0.1833778
quantile(boot, 0.025)
##       2.5% 
## -0.2927582
quantile(boot, 0.975)
##      97.5% 
## -0.0716899
# Species correlations

summary(species.stats$correlation_cvrecrate)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.00000 -0.50000 -0.25000 -0.18266  0.07143  1.00000
cor.test(species.stats$cv.recrate, species.stats$phys.map.length, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  species.stats$cv.recrate and species.stats$phys.map.length
## S = 13928, p-value = 1.398e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.5486129
# Linear regression
lm_model = lm(cv.recrate ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = cv.recrate ~ phys.map.length, data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6389 -0.2502 -0.1124  0.2086  1.6000 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.7934943  0.0606156  13.091  < 2e-16 ***
## phys.map.length 0.0012845  0.0003046   4.217 9.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3859 on 55 degrees of freedom
## Multiple R-squared:  0.2443, Adjusted R-squared:  0.2306 
## F-statistic: 17.78 on 1 and 55 DF,  p-value: 9.308e-05
The coefficient of variation depends on the mean chromosome size across species (n = 38). The broken stick variance was log-transformed and scaled for comparison between species. The inset presents the distribution of Spearman’s correlation within species between the coefficient of variation and chromosome length. Mean correlation and 95% C.I. estimated by 1,000 bootstraps.

Figure 3.6: The coefficient of variation depends on the mean chromosome size across species (n = 38). The broken stick variance was log-transformed and scaled for comparison between species. The inset presents the distribution of Spearman’s correlation within species between the coefficient of variation and chromosome length. Mean correlation and 95% C.I. estimated by 1,000 bootstraps.

Within species: Chromosome-level correlation.

## 
## Call:
## lm(formula = cv.recrate ~ log10(phys.map.length), data = chromosome.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0652 -0.2445 -0.0515  0.1790  4.0242 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.02402    0.06051   0.397    0.692    
## log10(phys.map.length)  0.54023    0.03493  15.467   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.401 on 663 degrees of freedom
## Multiple R-squared:  0.2652, Adjusted R-squared:  0.264 
## F-statistic: 239.2 on 1 and 663 DF,  p-value: < 2.2e-16
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: cv.recrate ~ log10(phys.map.length) + (log10(phys.map.length) |  
##     species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 216.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9743 -0.4893 -0.0659  0.4373  9.7273 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr 
##  species  (Intercept)            0.74952  0.8657        
##           log10(phys.map.length) 0.36995  0.6082   -0.96
##  Residual                        0.06028  0.2455        
## Number of obs: 665, groups:  species, 57
## 
## Fixed effects:
##                        Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)              0.4867     0.1744 24.1775   2.790   0.0101 *
## log10(phys.map.length)   0.2099     0.1145 33.0175   1.833   0.0758 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg10(phy..) -0.975
##             R2m       R2c
## [1,] 0.03775708 0.7398221

For the coefficient of variation, the adjusted R-squared of the linear model is low and the mixed model explains a low proportion of inter-species effect (R2m, marginal \(R_2\)), as well as for the broken stick model.

The coefficient of variation within species does not always vary with chromosome size.

Figure 3.7: The coefficient of variation within species does not always vary with chromosome size.

3.3 Species-specific heterogeneity of recombination landscapes

Test explicitly the hypothesis that recombination landscapes are species specific. The variance in broken stick proportions and coefficient of variation should be correlated to the species.

# Test the species effect
aov_mod = aov(brokenstick_pvariance_scaled ~ species, data = chromosome.stats)
summary(aov_mod)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## species      56  466.9   8.338   26.95 <2e-16 ***
## Residuals   595  184.1   0.309                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 13 observations deleted due to missingness
lm_mod = lm(brokenstick_pvariance_scaled ~ species, data = chromosome.stats)
summary(lm_mod)
## 
## Call:
## lm(formula = brokenstick_pvariance_scaled ~ species, data = chromosome.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.81084 -0.27685  0.03808  0.32639  1.65400 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     1.31771    0.39333   3.350 0.000859 ***
## speciesArabidopsis thaliana    -2.54768    0.46539  -5.474 6.48e-08 ***
## speciesArachis duranensis       0.09747    0.44599   0.219 0.827077    
## speciesArachis hypogaea        -1.55533    0.41460  -3.751 0.000193 ***
## speciesBoechera stricta        -2.68769    0.44599  -6.026 2.94e-09 ***
## speciesBrachypodium distachyon -1.37641    0.50778  -2.711 0.006909 ** 
## speciesBrassica napus          -1.46820    0.42759  -3.434 0.000637 ***
## speciesBrassica rapa           -2.08077    0.44599  -4.665 3.81e-06 ***
## speciesCamelina sativa         -2.00421    0.42048  -4.766 2.36e-06 ***
## speciesCamellia sinensis       -2.53904    0.46539  -5.456 7.16e-08 ***
## speciesCapsella rubella        -2.17034    0.44599  -4.866 1.46e-06 ***
## speciesCapsicum annuum         -0.43555    0.42759  -1.019 0.308797    
## speciesCenchrus americanus     -0.09636    0.44599  -0.216 0.829012    
## speciesCitrullus lanatus       -1.59629    0.43087  -3.705 0.000231 ***
## speciesCitrus sinensis         -0.96389    0.43975  -2.192 0.028773 *  
## speciesCoffea canephora        -0.43031    0.42759  -1.006 0.314649    
## speciesCucumis melo            -0.73225    0.42759  -1.713 0.087323 .  
## speciesCucumis sativus         -3.46655    0.44599  -7.773 3.39e-14 ***
## speciesCucurbita maxima        -2.59422    0.41460  -6.257 7.50e-10 ***
## speciesCucurbita pepo          -2.39334    0.42048  -5.692 1.97e-08 ***
## speciesDioscorea alata         -1.03498    0.42250  -2.450 0.014586 *  
## speciesDraba nivalis           -2.03537    0.45417  -4.481 8.89e-06 ***
## speciesElaeis guineensis       -2.41353    0.43087  -5.602 3.25e-08 ***
## speciesEucalyptus grandis      -1.97041    0.42759  -4.608 4.97e-06 ***
## speciesGlycine max             -0.52113    0.41351  -1.260 0.208071    
## speciesGossypium hirsutum      -0.59942    0.40817  -1.469 0.142484    
## speciesGossypium raimondii     -0.86785    0.42250  -2.054 0.040403 *  
## speciesHelianthus annuus       -0.09566    0.42250  -0.226 0.820964    
## speciesHordeum vulgare          0.20261    0.44599   0.454 0.649786    
## speciesJuglans regia           -1.79731    0.42048  -4.274 2.23e-05 ***
## speciesLupinus albus           -1.43969    0.40876  -3.522 0.000461 ***
## speciesLupinus angustifolius   -0.52135    0.41252  -1.264 0.206792    
## speciesMalus domestica         -1.96454    0.41582  -4.724 2.88e-06 ***
## speciesMangifera indica        -1.73166    0.41252  -4.198 3.11e-05 ***
## speciesManihot esculenta       -1.98331    0.41460  -4.784 2.17e-06 ***
## speciesMomordica charantia     -1.29220    0.42759  -3.022 0.002619 ** 
## speciesNelumbo nucifera        -2.60544    0.45417  -5.737 1.54e-08 ***
## speciesOryza nivara            -1.76165    0.42484  -4.147 3.87e-05 ***
## speciesOryza sativa            -2.10905    0.42484  -4.964 9.02e-07 ***
## speciesPanicum hallii          -0.11654    0.43484  -0.268 0.788788    
## speciesPhaseolus vulgaris      -0.26853    0.42759  -0.628 0.530236    
## speciesPrunus mume             -2.56481    0.44599  -5.751 1.42e-08 ***
## speciesPrunus persica          -2.21901    0.43975  -5.046 6.00e-07 ***
## speciesQuercus sp              -2.44718    0.42484  -5.760 1.35e-08 ***
## speciesRaphanus sativus        -1.33445    0.45417  -2.938 0.003429 ** 
## speciesSesamum indicum         -2.11975    0.42250  -5.017 6.93e-07 ***
## speciesSetaria italica         -1.30421    0.43484  -2.999 0.002819 ** 
## speciesSolanum lycopersicum     0.28861    0.42484   0.679 0.497186    
## speciesSolanum tuberosum       -0.36958    0.42484  -0.870 0.384693    
## speciesSorghum bicolor         -0.66342    0.43087  -1.540 0.124155    
## speciesTheobroma cacao         -1.37108    0.43087  -3.182 0.001538 ** 
## speciesTriticum aestivum       -0.34092    0.41252  -0.826 0.408890    
## speciesTriticum dicoccoides    -0.13988    0.43087  -0.325 0.745556    
## speciesTriticum urartu         -0.60336    0.44599  -1.353 0.176614    
## speciesVigna unguiculata       -1.14472    0.42759  -2.677 0.007630 ** 
## speciesVitis vinifera          -1.69584    0.41351  -4.101 4.68e-05 ***
## speciesZea mays                -0.84711    0.43087  -1.966 0.049756 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5562 on 595 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.7172, Adjusted R-squared:  0.6906 
## F-statistic: 26.95 on 56 and 595 DF,  p-value: < 2.2e-16
# Test the species effect
aov_mod = aov(cv.recrate ~ species, data = chromosome.stats)
summary(aov_mod)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## species      56  105.5  1.8837   28.92 <2e-16 ***
## Residuals   608   39.6  0.0651                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lm_mod = lm(cv.recrate ~ species, data = chromosome.stats)
summary(lm_mod)
## 
## Call:
## lm(formula = cv.recrate ~ species, data = chromosome.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00545 -0.11579 -0.01508  0.11122  2.70661 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     1.55895    0.18047   8.638  < 2e-16 ***
## speciesArabidopsis thaliana    -1.08520    0.21353  -5.082 4.97e-07 ***
## speciesArachis duranensis       0.03361    0.20177   0.167 0.867779    
## speciesArachis hypogaea        -0.75205    0.19023  -3.953 8.61e-05 ***
## speciesBoechera stricta        -1.08469    0.20463  -5.301 1.62e-07 ***
## speciesBrachypodium distachyon -0.70161    0.23298  -3.011 0.002708 ** 
## speciesBrassica napus          -0.81326    0.19493  -4.172 3.46e-05 ***
## speciesBrassica rapa           -0.95514    0.20463  -4.668 3.75e-06 ***
## speciesCamelina sativa         -1.00457    0.19212  -5.229 2.35e-07 ***
## speciesCamellia sinensis       -1.11900    0.21353  -5.240 2.21e-07 ***
## speciesCapsella rubella        -0.90493    0.20463  -4.422 1.16e-05 ***
## speciesCapsicum annuum         -0.18093    0.19619  -0.922 0.356778    
## speciesCenchrus americanus      1.12162    0.20463   5.481 6.20e-08 ***
## speciesCitrullus lanatus       -0.83471    0.19769  -4.222 2.79e-05 ***
## speciesCitrus sinensis         -0.52343    0.20177  -2.594 0.009711 ** 
## speciesCoffea canephora        -0.39652    0.19619  -2.021 0.043710 *  
## speciesCucumis melo            -0.41467    0.19619  -2.114 0.034954 *  
## speciesCucumis sativus         -1.18774    0.20463  -5.804 1.04e-08 ***
## speciesCucurbita maxima        -1.00608    0.18973  -5.303 1.60e-07 ***
## speciesCucurbita pepo          -0.94488    0.19079  -4.952 9.51e-07 ***
## speciesDioscorea alata         -0.29643    0.19293  -1.536 0.124942    
## speciesDraba nivalis           -0.97952    0.20463  -4.787 2.13e-06 ***
## speciesElaeis guineensis       -1.00737    0.19619  -5.135 3.81e-07 ***
## speciesEucalyptus grandis      -0.85847    0.19619  -4.376 1.42e-05 ***
## speciesGlycine max             -0.43668    0.18973  -2.302 0.021695 *  
## speciesGossypium hirsutum      -0.29799    0.18728  -1.591 0.112098    
## speciesGossypium raimondii     -0.32976    0.19386  -1.701 0.089438 .  
## speciesHelianthus annuus        0.21115    0.19386   1.089 0.276499    
## speciesHordeum vulgare          0.10148    0.20463   0.496 0.620138    
## speciesJuglans regia           -0.90768    0.19212  -4.724 2.87e-06 ***
## speciesLupinus albus           -0.86176    0.18755  -4.595 5.27e-06 ***
## speciesLupinus angustifolius   -0.23931    0.18928  -1.264 0.206587    
## speciesMalus domestica         -0.99378    0.19079  -5.209 2.61e-07 ***
## speciesMangifera indica        -0.89565    0.18928  -4.732 2.77e-06 ***
## speciesManihot esculenta       -0.87401    0.19023  -4.594 5.28e-06 ***
## speciesMomordica charantia     -0.78715    0.19619  -4.012 6.77e-05 ***
## speciesNelumbo nucifera        -1.07501    0.20177  -5.328 1.40e-07 ***
## speciesOryza nivara            -0.87619    0.19493  -4.495 8.33e-06 ***
## speciesOryza sativa            -0.97567    0.19493  -5.005 7.32e-07 ***
## speciesPanicum hallii          -0.41636    0.19952  -2.087 0.037316 *  
## speciesPhaseolus vulgaris      -0.22964    0.19619  -1.171 0.242257    
## speciesPrunus mume             -1.03932    0.20463  -5.079 5.06e-07 ***
## speciesPrunus persica          -1.05811    0.20177  -5.244 2.17e-07 ***
## speciesQuercus sp              -1.01806    0.19493  -5.223 2.42e-07 ***
## speciesRaphanus sativus        -0.89503    0.20839  -4.295 2.03e-05 ***
## speciesSesamum indicum         -0.96275    0.19386  -4.966 8.88e-07 ***
## speciesSetaria italica         -0.64874    0.19952  -3.252 0.001211 ** 
## speciesSolanum lycopersicum     0.05541    0.19493   0.284 0.776311    
## speciesSolanum tuberosum       -0.50016    0.19493  -2.566 0.010531 *  
## speciesSorghum bicolor         -0.60795    0.19769  -3.075 0.002198 ** 
## speciesTheobroma cacao         -0.75106    0.19769  -3.799 0.000160 ***
## speciesTriticum aestivum       -0.09606    0.18928  -0.508 0.611966    
## speciesTriticum dicoccoides    -0.02178    0.19769  -0.110 0.912328    
## speciesTriticum urartu         -0.53515    0.20463  -2.615 0.009139 ** 
## speciesVigna unguiculata       -0.75591    0.19619  -3.853 0.000129 ***
## speciesVitis vinifera          -0.77720    0.18973  -4.096 4.77e-05 ***
## speciesZea mays                -0.40769    0.19769  -2.062 0.039611 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2552 on 608 degrees of freedom
## Multiple R-squared:  0.727,  Adjusted R-squared:  0.7019 
## F-statistic: 28.92 on 56 and 608 DF,  p-value: < 2.2e-16

3.4 COs are biased toward the periphery for many species

Estimate the bias toward the periphery.

# Estimate the mean periphery-bias ratio with bootstrapped C.I.
nboot = 1000
boot = numeric(nboot)
for (i in 1:nboot) {
    boot[i] = mean(sample(chromosome.stats$peripherybias_ratio, replace = TRUE), 
        na.rm = TRUE)
}
mean(boot)
## [1] 2.187515
quantile(boot, 0.025)
##     2.5% 
## 2.068109
quantile(boot, 0.975)
##    97.5% 
## 2.329313
# Results
hist(chromosome.stats$peripherybias_ratio, xlab = "Periphery-bias ratio", breaks = 30, 
    main = "")
abline(v = 1, col = "black")
abline(v = mean(boot), col = "red", lwd = 2)
abline(v = quantile(boot, 0.025), lty = "dashed", col = "red", lwd = 2)
abline(v = quantile(boot, 0.975), lty = "dashed", col = "red", lwd = 2)

# How many maps have a periphery-bias ratio over 1 (i.e. more recombination at
# the tips than at the center)?
sum(chromosome.stats$peripherybias_ratio > 1, na.rm = TRUE)
## [1] 484
# And in proportion
sum(chromosome.stats$peripherybias_ratio > 1, na.rm = TRUE)/nrow(chromosome.stats)
## [1] 0.7278195
# How many maps have a periphery-bias ratio under 1 (i.e. more recombination at
# the center than on the tips)?
sum(chromosome.stats$peripherybias_ratio < 1, na.rm = TRUE)
## [1] 169
# And in proportion
sum(chromosome.stats$peripherybias_ratio < 1, na.rm = TRUE)/nrow(chromosome.stats)
## [1] 0.2541353

Sample size.

# Sample size
nrow(species.stats)
## [1] 57

Test which species have a significant periphery-bias ratio (supplementary table).

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties

## Warning in
## cor.test.default(chromosome.stats$phys.map.length[which(chromosome.stats$species
## == : Cannot compute exact p-value with ties
## [1] 0.1201095
##       2.5% 
## 0.01758947
##     97.5% 
## 0.2195359

kable(species.stats[, c(1:4, 9:10)], caption = "Species correlation between the periphery-bias ratio and chromosome length.", 
    labels = c("Species", "Chromosome number", "Chromosome size", "Mean periphery-bias", 
        "Spearman correlation", "p-value"), align = "c")
Table 3.1: Species correlation between the periphery-bias ratio and chromosome length.
species chromosome_number phys.map.length peripherybias_ratio correlation_peripherybias correlation_peripherybias_p
Aegilops speltoides 2 517.44324 5.7567440 -1.0000000 1.0000000
Arabidopsis thaliana 5 23.82927 1.6017099 -0.7000000 0.2333333
Arachis duranensis 8 100.95097 4.6362972 0.8095238 0.0217758
Arachis hypogaea 18 124.68497 2.0663790 0.1682147 0.5032006
Boechera stricta 7 26.52555 1.5082194 -0.0714286 0.9063492
Brachypodium distachyon 3 45.62172 2.6055663 0.5000000 1.0000000
Brassica napus 12 44.71564 1.6530123 0.2242424 0.5366881
Brassica rapa 7 26.14066 1.5540350 0.0000000 1.0000000
Camelina sativa 15 30.63062 1.3559678 -0.1118881 0.7327754
Camellia sinensis 5 222.10960 1.1966508 0.5000000 1.0000000
Capsella rubella 7 15.83906 0.6845333 -0.1428571 0.8027778
Capsicum annuum 11 227.12175 3.1911317 -0.2272727 0.5030670
Cenchrus americanus 7 223.50536 5.3945568 0.5357143 0.2357143
Citrullus lanatus 10 33.59728 1.3777785 0.6121212 0.0664691
Citrus sinensis 8 26.02017 2.1229883 -0.1666667 0.7033234
Coffea canephora 11 53.00321 2.8290326 0.0727273 0.8388249
Cucumis melo 11 30.99121 2.6899882 0.4363636 0.1825319
Cucumis sativus 7 27.40843 0.6934489 0.7142857 0.0880952
Cucurbita maxima 19 10.34886 0.6825170 0.1053922 0.6872983
Cucurbita pepo 17 10.78700 0.9581566 0.0441176 0.8685676
Dioscorea alata 14 23.84612 3.2114984 -0.1384615 0.6375158
Draba nivalis 7 34.11541 1.4252100 0.1785714 0.7130952
Elaeis guineensis 11 46.94000 0.8058865 0.0181818 0.9728412
Eucalyptus grandis 11 55.68998 1.1661899 0.3818182 0.2483806
Glycine max 19 48.12719 2.4895647 0.0438596 0.8598111
Gossypium hirsutum 26 85.92276 3.7541280 0.3394872 0.0902240
Gossypium raimondii 13 57.63293 3.8525660 0.2197802 0.4703526
Helianthus annuus 13 169.70400 4.4133912 0.3956044 0.1821958
Hordeum vulgare 7 654.85949 4.3176242 -0.3214286 0.4976190
Juglans regia 15 47.43156 1.0989006 -0.0392857 0.8929284
Lupinus albus 25 17.35603 1.4352301 0.1561538 0.4543071
Lupinus angustifolius 20 27.89545 3.7114974 0.2120301 0.3678291
Malus domestica 17 38.38249 1.4088867 -0.2622549 0.3079967
Mangifera indica 20 17.87202 0.8990063 0.3834586 0.0960014
Manihot esculenta 18 28.80229 0.3127434 -0.0392359 0.8771588
Momordica charantia 11 26.50729 0.9642987 0.0607906 0.8675110
Nelumbo nucifera 8 97.84551 1.0173700 0.1428571 0.7520337
Oryza nivara 12 28.16253 1.8005960 -0.0489510 0.8863351
Oryza sativa 12 31.10379 1.2220253 0.4125874 0.1844807
Panicum hallii 9 56.37928 1.8065166 -0.5666667 0.1205743
Phaseolus vulgaris 11 46.80187 3.5497073 0.5363636 0.0936425
Prunus mume 7 24.98833 1.1443345 -0.3571429 0.4444444
Prunus persica 8 28.21185 1.2991090 0.3809524 0.3598710
Quercus sp 12 59.72765 1.0266447 -0.2447552 0.4435043
Raphanus sativus 6 38.78895 1.2223674 0.7714286 0.1027778
Sesamum indicum 13 16.40985 0.7610006 0.5879121 0.0381070
Setaria italica 9 44.58849 2.6615801 0.3666667 0.3362599
Solanum lycopersicum 12 67.26872 4.3148913 0.5804196 0.0520862
Solanum tuberosum 12 60.41812 2.4960328 0.4615385 0.1338363
Sorghum bicolor 10 68.36450 2.0348211 -0.6484848 0.0490426
Theobroma cacao 10 33.04562 2.0835986 0.1878788 0.6075669
Triticum aestivum 20 667.82536 3.9899830 0.1353383 0.5681081
Triticum dicoccoides 10 582.40116 4.6545740 -0.3608715 0.3056130
Triticum urartu 7 665.94253 2.3667467 0.0714286 0.9063492
Vigna unguiculata 11 43.04148 1.9663878 0.5363636 0.0936425
Vitis vinifera 19 22.43032 1.7051993 0.0175439 0.9454163
Zea mays 10 210.63381 3.5198619 0.0303030 0.9457098
S6_peripherybias = species.stats[, c(1:4, 9:10)]
colnames(S6_peripherybias) = c("Species", "Chromosome number", "Chromosome size", 
    "Mean periphery-bias", "Spearman correlation", "p-value")

write.xlsx(x = S6_peripherybias, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S6_peripherybias", row.names = FALSE, append = TRUE)
rm(S6_peripherybias)

Is the correlation between the periphery-bias ratio and chromosome size significant across species? Chromosomes pooled per species.

nrow(species.stats)
## [1] 57
# Model selection Linear regression
lm_model = lm(peripherybias_ratio ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = peripherybias_ratio ~ phys.map.length, data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3207 -0.8261 -0.2908  0.6577  2.6611 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.746358   0.176310   9.905 7.85e-14 ***
## phys.map.length 0.004417   0.000886   4.985 6.54e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 55 degrees of freedom
## Multiple R-squared:  0.3112, Adjusted R-squared:  0.2987 
## F-statistic: 24.85 on 1 and 55 DF,  p-value: 6.536e-06
AIC(lm_model)
## [1] 178.8955
BIC(lm_model)
## [1] 185.0246
# Quadratic regression
lm_model = lm(peripherybias_ratio ~ phys.map.length + I(phys.map.length^2), data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = peripherybias_ratio ~ phys.map.length + I(phys.map.length^2), 
##     data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8388 -0.6438 -0.2247  0.5360  2.0807 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.158e+00  2.226e-01   5.203 3.11e-06 ***
## phys.map.length       1.751e-02  3.573e-03   4.901 9.09e-06 ***
## I(phys.map.length^2) -2.052e-05  5.458e-06  -3.760  0.00042 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.009 on 54 degrees of freedom
## Multiple R-squared:  0.4541, Adjusted R-squared:  0.4339 
## F-statistic: 22.46 on 2 and 54 DF,  p-value: 7.985e-08
AIC(lm_model)
## [1] 167.6433
BIC(lm_model)
## [1] 175.8155

Selected model for simplicity: LM

# The mean species correlation estimated by 1,000 bootstraps
mean(boot)
## [1] 0.1201095
quantile(boot, 0.025)
##       2.5% 
## 0.01758947
quantile(boot, 0.975)
##     97.5% 
## 0.2195359
# Species correlations
summary(species.stats$correlation_peripherybias)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.00000 -0.07143  0.13534  0.12078  0.39560  0.80952
cor.test(species.stats$peripherybias_ratio, species.stats$phys.map.length, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  species.stats$peripherybias_ratio and species.stats$phys.map.length
## S = 12226, p-value = 1.175e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6037724
# Linear regression
lm_model = lm(peripherybias_ratio ~ log10(phys.map.length), data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = peripherybias_ratio ~ log10(phys.map.length), data = species.stats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.24893 -0.47523 -0.09788  0.53007  2.06181 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -1.2315     0.5314  -2.317   0.0242 *  
## log10(phys.map.length)   1.9931     0.2972   6.705 1.15e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.003 on 55 degrees of freedom
## Multiple R-squared:  0.4498, Adjusted R-squared:  0.4398 
## F-statistic: 44.96 on 1 and 55 DF,  p-value: 1.145e-08
## [1] 2.190423
##     2.5% 
## 2.062457
##    97.5% 
## 2.322344
(ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

Figure 3.8: (ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

(ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

Figure 3.9: (ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

lm.mod = lm(peripherybias_ratio ~ linkage.map.length, data = tmp_species.stats)
summary(lm.mod)
## 
## Call:
## lm(formula = peripherybias_ratio ~ linkage.map.length, data = tmp_species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9444 -1.0036 -0.3173  0.9565  3.3716 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        1.780529   0.543201   3.278  0.00182 **
## linkage.map.length 0.003550   0.004158   0.854  0.39688   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.344 on 55 degrees of freedom
## Multiple R-squared:  0.01308,    Adjusted R-squared:  -0.004861 
## F-statistic: 0.7291 on 1 and 55 DF,  p-value: 0.3969
Linkage map length (i.e. crossover frequency) does not correlate with the periphery-bias ratio.

Figure 3.10: Linkage map length (i.e. crossover frequency) does not correlate with the periphery-bias ratio.

Is the relationship between chromosome size and the periphery-bias ratio universal within species?

qqPlot(sqrt(chromosome.stats$peripherybias_ratio), ylab = "Periphery-bias ratio")

## [1] 103 310
cor.test(log10(chromosome.stats$phys.map.length), sqrt(chromosome.stats$peripherybias_ratio), 
    method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  log10(chromosome.stats$phys.map.length) and sqrt(chromosome.stats$peripherybias_ratio)
## S = 23534674, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4928681
# LINEAR MODEL
lm_model = lm(sqrt(peripherybias_ratio) ~ log10(phys.map.length), data = chromosome.stats)
summary(lm_model)
## 
## Call:
## lm(formula = sqrt(peripherybias_ratio) ~ log10(phys.map.length), 
##     data = chromosome.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3270 -0.3129 -0.0238  0.3091  1.3359 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.27550    0.07516   3.666 0.000267 ***
## log10(phys.map.length)  0.64989    0.04333  15.000  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4927 on 651 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.2568, Adjusted R-squared:  0.2557 
## F-statistic:   225 on 1 and 651 DF,  p-value: < 2.2e-16
AIC(lm_model)
## [1] 932.7827
# LINEAR MIXED MODEL
lmer_model = lmer(sqrt(peripherybias_ratio) ~ log10(phys.map.length) + (1 | species), 
    data = chromosome.stats)
summary(lmer_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: sqrt(peripherybias_ratio) ~ log10(phys.map.length) + (1 | species)
##    Data: chromosome.stats
## 
## REML criterion at convergence: 726.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8023 -0.5813  0.0342  0.5713  3.0047 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 0.09366  0.3060  
##  Residual             0.14790  0.3846  
## Number of obs: 653, groups:  species, 57
## 
## Fixed effects:
##                         Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)              0.25031    0.14938 102.50053   1.676   0.0968 .  
## log10(phys.map.length)   0.65566    0.08353 112.44430   7.849 2.67e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg10(phy..) -0.956
AIC(lmer_model)
## [1] 734.4097
The periphery-bias ratio increases with chromosome size across species, but the relationship does not stand within species.

Figure 3.11: The periphery-bias ratio increases with chromosome size across species, but the relationship does not stand within species.

3.4.1 Sensitivity of the metric (periphery-bias ratio) to the sampling scale.

Sensitivity of the periphery-bias ratio metric to the sampling scale (i.e. the number of bins sampled at the tip of the chromosome). The periphery-bias ratio seems steady with an increasing number of bins sampled, though the ratio is slightly decreasing when more bins are sampled.

Figure 3.12: Sensitivity of the periphery-bias ratio metric to the sampling scale (i.e. the number of bins sampled at the tip of the chromosome). The periphery-bias ratio seems steady with an increasing number of bins sampled, though the ratio is slightly decreasing when more bins are sampled.

3.4.1.1 Figure S12

4 Telomeres and centromeres both influence the distribution of crossovers

4.1 Telomere effect

Recombination rates increase with the distance to the nearest telomere.

Correlations between recombination rates and distance to the nearest telomere.

Test which species have a significant correlation with the distance to the telomere (supplementary table).

S7_correlationchromosomes = cor_dist2telomere[, 1:4]
colnames(S7_correlationchromosomes) = c("Species", "Chromosome", "Spearman correlation", 
    "p-value")

write.xlsx(x = S7_correlationchromosomes, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S7_correlationchromosomes", row.names = FALSE, append = TRUE)
rm(S7_correlationchromosomes)

Correlations at species and chromosome level

# Sample size
nrow(species.stats)
## [1] 57
kable(species.stats[, c(1:3, 11)], caption = "Species correlation between the distance to the nearest telomere and the recombination rate.", 
    labels = c("Species", "Chromosome number", "Chromosome size", "Spearman correlation"), 
    align = "c")
Table 4.1: Species correlation between the distance to the nearest telomere and the recombination rate.
species chromosome_number phys.map.length correlation_dist2telomere
Aegilops speltoides 2 517.44324 -0.9447383
Arabidopsis thaliana 5 23.82927 -0.1925520
Arachis duranensis 8 100.95097 -0.5861152
Arachis hypogaea 18 124.68497 -0.7865345
Boechera stricta 7 26.52555 -0.4861575
Brachypodium distachyon 3 45.62172 -0.7064544
Brassica napus 12 44.71564 -0.3911250
Brassica rapa 7 26.14066 -0.5416268
Camelina sativa 15 30.63062 -0.3028815
Camellia sinensis 5 222.10960 -0.0965251
Capsella rubella 7 15.83906 0.5766299
Capsicum annuum 11 227.12175 -0.8177649
Cenchrus americanus 7 223.50536 -0.3117457
Citrullus lanatus 10 33.59728 -0.4349560
Citrus sinensis 8 26.02017 -0.6789178
Coffea canephora 11 53.00321 -0.7962224
Cucumis melo 11 30.99121 -0.7843992
Cucumis sativus 7 27.40843 -0.0274564
Cucurbita maxima 19 10.34886 -0.2031031
Cucurbita pepo 17 10.78700 -0.1623203
Dioscorea alata 14 23.84612 -0.7774816
Draba nivalis 7 34.11541 -0.7419194
Elaeis guineensis 11 46.94000 0.2496445
Eucalyptus grandis 11 55.68998 -0.3969593
Glycine max 19 48.12719 -0.6771574
Gossypium hirsutum 26 85.92276 -0.8310385
Gossypium raimondii 13 57.63293 -0.7418663
Helianthus annuus 13 169.70400 -0.6588514
Hordeum vulgare 7 654.85949 -0.8248608
Juglans regia 15 47.43156 -0.4387363
Lupinus albus 25 17.35603 -0.5677997
Lupinus angustifolius 20 27.89545 -0.7649821
Malus domestica 17 38.38249 -0.6837059
Mangifera indica 20 17.87202 -0.2076802
Manihot esculenta 18 28.80229 -0.0641793
Momordica charantia 11 26.50729 -0.4024631
Nelumbo nucifera 8 97.84551 0.1356657
Oryza nivara 12 28.16253 -0.5404312
Oryza sativa 12 31.10379 -0.5798772
Panicum hallii 9 56.37928 -0.8367877
Phaseolus vulgaris 11 46.80187 -0.6929217
Prunus mume 7 24.98833 -0.1507408
Prunus persica 8 28.21185 -0.6062939
Quercus sp 12 59.72765 -0.0568060
Raphanus sativus 6 38.78895 -0.1157651
Sesamum indicum 13 16.40985 -0.0026683
Setaria italica 9 44.58849 -0.7556303
Solanum lycopersicum 12 67.26872 -0.7020987
Solanum tuberosum 12 60.41812 -0.6201034
Sorghum bicolor 10 68.36450 -0.7765123
Theobroma cacao 10 33.04562 -0.7852950
Triticum aestivum 20 667.82536 -0.8434463
Triticum dicoccoides 10 582.40116 -0.8247954
Triticum urartu 7 665.94253 -0.8544557
Vigna unguiculata 11 43.04148 -0.6674743
Vitis vinifera 19 22.43032 -0.4937258
Zea mays 10 210.63381 -0.8078647
S8_correlationspecies = species.stats[, c(1:3, 11)]
colnames(S8_correlationspecies) = c("Species", "Chromosome number", "Chromosome size", 
    "Spearman correlation")

write.xlsx(x = S8_correlationspecies, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S8_correlationspecies", row.names = FALSE, append = TRUE)
rm(S8_correlationspecies)

Selected model for simplicity: LM

# The mean species correlation estimated by 1,000 bootstraps
mean(boot)
## [1] -0.5044995
quantile(boot, 0.025)
##       2.5% 
## -0.5868395
quantile(boot, 0.975)
##      97.5% 
## -0.4142717
# Species correlations
summary(species.stats$correlation_dist2telomere)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.9447 -0.7765 -0.6063 -0.5050 -0.3029  0.5766
cor.test(species.stats$correlation_dist2telomere, species.stats$phys.map.length, 
    method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  species.stats$correlation_dist2telomere and species.stats$phys.map.length
## S = 46712, p-value = 5.565e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5138709
# Linear regression
lm_model = lm(correlation_dist2telomere ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = correlation_dist2telomere ~ phys.map.length, data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3673 -0.2302 -0.0907  0.2341  1.0177 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.4299957  0.0485244  -8.861 3.52e-12 ***
## phys.map.length -0.0007009  0.0002438  -2.874  0.00574 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3089 on 55 degrees of freedom
## Multiple R-squared:  0.1306, Adjusted R-squared:  0.1148 
## F-statistic: 8.263 on 1 and 55 DF,  p-value: 0.005744
(ref:dist2telomere-species) The negative correlation between recombination rates and the distance to the nearest telomere is stronger for species with a larger chromosome size (n = 57). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. The inset presents the distribution of Spearman correlation coefficients for chromosomes (n = 665 chromosomes). The mean correlation and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The red vertical line is for a null correlation.

Figure 4.1: (ref:dist2telomere-species) The negative correlation between recombination rates and the distance to the nearest telomere is stronger for species with a larger chromosome size (n = 57). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. The inset presents the distribution of Spearman correlation coefficients for chromosomes (n = 665 chromosomes). The mean correlation and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The red vertical line is for a null correlation.

4.1.0.1 Figure S5

Interestingly, at a species level, we can identify two groups of species. Although most species correlations are globally significant (i.e. a majority of significant chromosomes), there were a group of high negative correlations (lower than -0.5) and another one of lower/null correlations (between -0.5 and 0.1).

(#tab:SpeciesCorrelationPercentage of significant correlationsChromosome length)Species correlation between chromosome size and distance to the nearest telomere
species correlation signif phys.map.length
Aegilops speltoides -0.94 1.00 517.44
Arabidopsis thaliana -0.19 0.40 23.83
Arachis duranensis -0.59 1.00 100.95
Arachis hypogaea -0.79 1.00 124.68
Boechera stricta -0.49 0.71 26.53
Brachypodium distachyon -0.71 1.00 45.62
Brassica napus -0.39 0.83 44.72
Brassica rapa -0.54 0.86 26.14
Camelina sativa -0.30 0.87 30.63
Camellia sinensis -0.10 1.00 222.11
Capsella rubella 0.58 0.86 15.84
Capsicum annuum -0.82 1.00 227.12
Cenchrus americanus -0.31 1.00 223.51
Citrullus lanatus -0.43 1.00 33.60
Citrus sinensis -0.68 0.88 26.02
Coffea canephora -0.80 1.00 53.00
Cucumis melo -0.78 1.00 30.99
Cucumis sativus -0.03 0.86 27.41
Cucurbita maxima -0.20 0.58 10.35
Cucurbita pepo -0.16 0.71 10.79
Dioscorea alata -0.78 1.00 23.85
Draba nivalis -0.74 1.00 34.12
Elaeis guineensis 0.25 0.82 46.94
Eucalyptus grandis -0.40 0.82 55.69
Glycine max -0.68 1.00 48.13
Gossypium hirsutum -0.83 1.00 85.92
Gossypium raimondii -0.74 1.00 57.63
Helianthus annuus -0.66 1.00 169.70
Hordeum vulgare -0.82 1.00 654.86
Juglans regia -0.44 1.00 47.43
Lupinus albus -0.57 0.92 17.36
Lupinus angustifolius -0.76 0.95 27.90
Malus domestica -0.68 1.00 38.38
Mangifera indica -0.21 0.70 17.87
Manihot esculenta -0.06 0.83 28.80
Momordica charantia -0.40 0.82 26.51
Nelumbo nucifera 0.14 1.00 97.85
Oryza nivara -0.54 1.00 28.16
Oryza sativa -0.58 0.83 31.10
Panicum hallii -0.84 1.00 56.38
Phaseolus vulgaris -0.69 1.00 46.80
Prunus mume -0.15 0.43 24.99
Prunus persica -0.61 1.00 28.21
Quercus sp -0.06 0.83 59.73
Raphanus sativus -0.12 1.00 38.79
Sesamum indicum 0.00 0.85 16.41
Setaria italica -0.76 1.00 44.59
Solanum lycopersicum -0.70 1.00 67.27
Solanum tuberosum -0.62 0.92 60.42
Sorghum bicolor -0.78 1.00 68.36
Theobroma cacao -0.79 1.00 33.05
Triticum aestivum -0.84 1.00 667.83
Triticum dicoccoides -0.82 1.00 582.40
Triticum urartu -0.85 1.00 665.94
Vigna unguiculata -0.67 1.00 43.04
Vitis vinifera -0.49 0.84 22.43
Zea mays -0.81 1.00 210.63

Finally, to identify an inter-specific recurrent pattern, chromosomes were pooled per species. The relationship between recombination rates and the distance toward telomeres was assessed by quadratic regression with a Liner Mixed Model specifying a random species effect. Coefficients of the model and their confidence intervals at 95% were estimated with parametric bootstrap.

Standardized recombination rate (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. Linear quadratic regression estimated with a Linear Model is the black line (95% confidence interval in dark grey).

Figure 4.2: Standardized recombination rate (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. Linear quadratic regression estimated with a Linear Model is the black line (95% confidence interval in dark grey).

To uncover specific effects that could ultimately differ from the quadratic regression, I plotted loess regression lines over the pooled data.

Standardized recombination rate (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. A loess regression was estimated for each species.

Figure 4.3: Standardized recombination rate (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. A loess regression was estimated for each species.

4.1.0.2 Figure S6

Classify species in three categories: telomere pattern, sub-telomere pattern and exceptions.

Exception patterns (6 species) tend to be higher at the center than on the rest of the chromosome. Interestingly, Nelumbo nucifera and Camellia sinensis are among these outlying patterns.

kable(telomere_pattern)
species pattern
Aegilops speltoides telomere
Arabidopsis thaliana exception
Arachis duranensis telomere
Arachis hypogaea telomere
Boechera stricta peak
Brachypodium distachyon telomere
Brassica napus telomere
Brassica rapa peak
Camelina sativa telomere
Camellia sinensis exception
Capsella rubella exception
Capsicum annuum telomere
Cenchrus americanus telomere
Citrullus lanatus telomere
Citrus sinensis telomere
Coffea canephora telomere
Cucumis melo telomere
Cucumis sativus peak
Cucurbita maxima peak
Cucurbita pepo peak
Dioscorea alata telomere
Draba nivalis peak
Elaeis guineensis peak
Eucalyptus grandis exception
Glycine max telomere
Gossypium hirsutum telomere
Gossypium raimondii telomere
Helianthus annuus telomere
Hordeum vulgare telomere
Juglans regia peak
Lupinus albus telomere
Lupinus angustifolius telomere
Malus domestica peak
Mangifera indica peak
Manihot esculenta peak
Momordica charantia peak
Nelumbo nucifera exception
Oryza nivara telomere
Oryza sativa telomere
Panicum hallii peak
Phaseolus vulgaris telomere
Prunus mume telomere
Prunus persica peak
Quercus sp exception
Raphanus sativus exception
Sesamum indicum peak
Setaria italica telomere
Solanum lycopersicum telomere
Solanum tuberosum telomere
Sorghum bicolor peak
Theobroma cacao telomere
Triticum aestivum telomere
Triticum dicoccoides telomere
Triticum urartu telomere
Vigna unguiculata telomere
Vitis vinifera telomere
Zea mays telomere
# The distribution of patterns among species
table(telomere_pattern$pattern)
## 
## exception      peak  telomere 
##         7        16        34
chromosome.stats.telomere = merge(chromosome.stats, telomere_pattern)
boxplot(phys.map.length ~ pattern, data = chromosome.stats.telomere)

summary(chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == 
    "telomere"], na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.68   28.64   48.58  128.38  115.51  830.83
summary(chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == 
    "peak"], na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.05   17.22   27.22   29.88   37.14   80.88
summary(chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == 
    "exception"], na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.32   31.61   51.70   68.25   72.22  336.97
wilcox.test(chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == 
    "telomere"], chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == 
    "peak"])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == "telomere"] and chromosome.stats.telomere$phys.map.length[chromosome.stats.telomere$pattern == "peak"]
## W = 61080, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
(ref:DistancesRelative-pooledChromosomes-patterns) Recombination rates are higher in distal regions and lower near the center of the chromosome. Standardized recombination rates (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Two patterns were identified and species were pooled accordingly. Recombination rates decrease immediately from the tip of the chromosome for 35 species (dark grey line and ribbon) or recombination rate is reduced in telomeric regions and the peak of recombination is in a sub-telomeric region for 16 species (light grey line and ribbon). The solid line represents the mean recombination rate estimated in a bin and upper and lower boundaries of the ribbon represent the maximum and minimum values for a particular pattern. Besides, patterns that were not classified (6 species) were represented by loess regression in grey dashed lines. For estimating recombination rates in bins of relative distance, chromosomes were split in halves, a distance of 0.5 being the center of the chromosome. Then, chromosomes were pooled per species.

Figure 4.4: (ref:DistancesRelative-pooledChromosomes-patterns) Recombination rates are higher in distal regions and lower near the center of the chromosome. Standardized recombination rates (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Two patterns were identified and species were pooled accordingly. Recombination rates decrease immediately from the tip of the chromosome for 35 species (dark grey line and ribbon) or recombination rate is reduced in telomeric regions and the peak of recombination is in a sub-telomeric region for 16 species (light grey line and ribbon). The solid line represents the mean recombination rate estimated in a bin and upper and lower boundaries of the ribbon represent the maximum and minimum values for a particular pattern. Besides, patterns that were not classified (6 species) were represented by loess regression in grey dashed lines. For estimating recombination rates in bins of relative distance, chromosomes were split in halves, a distance of 0.5 being the center of the chromosome. Then, chromosomes were pooled per species.

4.1.0.3 Figure 6

4.1.1 Figure 7

Redraw the periphery-bias ratio with colors according to the classification of patterns.

# The mean species correlation estimated by 1,000 bootstraps
mean(boot)
## [1] -0.5044995
quantile(boot, 0.025)
##       2.5% 
## -0.5868395
quantile(boot, 0.975)
##      97.5% 
## -0.4142717
# Species correlations
summary(species.stats$correlation_peripherybias)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.00000 -0.07143  0.13534  0.12078  0.39560  0.80952
cor.test(species.stats$peripherybias_ratio, species.stats$phys.map.length, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  species.stats$peripherybias_ratio and species.stats$phys.map.length
## S = 12226, p-value = 1.175e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6037724
# Linear regression
lm_model = lm(peripherybias_ratio ~ phys.map.length, data = species.stats)
summary(lm_model)
## 
## Call:
## lm(formula = peripherybias_ratio ~ phys.map.length, data = species.stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3207 -0.8261 -0.2908  0.6577  2.6611 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.746358   0.176310   9.905 7.85e-14 ***
## phys.map.length 0.004417   0.000886   4.985 6.54e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 55 degrees of freedom
## Multiple R-squared:  0.3112, Adjusted R-squared:  0.2987 
## F-statistic: 24.85 on 1 and 55 DF,  p-value: 6.536e-06
## [1] 2.192275
##     2.5% 
## 2.066465
##    97.5% 
## 2.323914
(ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

Figure 4.5: (ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

(ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

Figure 4.6: (ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

(ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line. (ref:peripherybias-species) The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line. The periphery-bias ratio depends on the mean chromosome size across species (n = 38). The linear regression line and its parametric 95% confidence interval were estimated in ggplot2. A periphery-bias ratio above one indicates that recombination rate in the tips of the chromosome are higher than the mean chromosome recombination rate. The inset presents the distribution of periphery-bias ratios (n = 665 chromosomes). The mean periphery-bias ratio and its 95% confidence interval (black solid and dashed lines) were estimated by 1,000 bootstraps. The theoretical value for equal recombination in the tips than the rest of the chromosome (periphery-bias ratio of one) is the red vertical line.

4.1.1.1 Figure 5

4.2 Centromere effect

4.2.1 How many chromosomes have lower recombination in the centromere?

For each chromosome, estimate the recombination rate within the centromere (recombination rate of the 100 kb window overlapping the centromere position).

# How many chromosomes with a centromere position?
sum(!is.na(chromosome.stats$rec.in.centromere))
## [1] 425
# How many chromosomes with no recombination in centromere
sum(chromosome.stats$rec.in.centromere == 0, na.rm = TRUE)
## [1] 81
sum(chromosome.stats$rec.in.centromere == 0, na.rm = TRUE)/sum(!is.na(chromosome.stats$rec.in.centromere))
## [1] 0.1905882
# How many chromosomes with less recombination in centromere than the chromosome
# averaged recombination rate?
sum(chromosome.stats$rec.in.centromere < chromosome.stats$mean.recrate, na.rm = TRUE)
## [1] 388
sum(chromosome.stats$rec.in.centromere < chromosome.stats$mean.recrate, na.rm = TRUE)/sum(!is.na(chromosome.stats$rec.in.centromere))
## [1] 0.9129412

Resampling test for each chromosome. Is the centromeric rec. rate significantly lower than the chromosome averaged rec. rate?

# How many chromosomes with SIGNIFICANTLY less recombination in centromere than
# the chromosome averaged recombination rate?

# Resampling test. 1000 bootstraps.
sum(chromosome.stats$rec.in.centromere < chromosome.stats$lower.average.recrate, 
    na.rm = TRUE)
## [1] 385
sum(chromosome.stats$rec.in.centromere < chromosome.stats$lower.average.recrate, 
    na.rm = TRUE)/sum(!is.na(chromosome.stats$rec.in.centromere))
## [1] 0.9058824
# And significantly higher?
sum(chromosome.stats$rec.in.centromere > chromosome.stats$upper.average.recrate, 
    na.rm = TRUE)
## [1] 30
sum(chromosome.stats$rec.in.centromere > chromosome.stats$upper.average.recrate, 
    na.rm = TRUE)/sum(!is.na(chromosome.stats$rec.in.centromere))
## [1] 0.07058824
par(mfrow = c(1, 2))
hist(chromosome.stats$rec.in.centromere, main = "", xlab = "Recombination in centromere", 
    breaks = 50)
hist(chromosome.stats$boot.average.recrate, main = "", xlab = "Averaged recombination", 
    breaks = 50)

par(mfrow = c(1, 1))

4.2.2 Formal test of a centromere effect

Proportion of chromosomes that are metacentric or sub-metacentric in the dataset (long arm/short arm ratio < 3).

ratio = rep(NA, nrow(chromosome.stats))
for (i in 1:nrow(chromosome.stats)) {
    ratio[i] = max(c(chromosome.stats$centromeric_index[i], (1 - chromosome.stats$centromeric_index[i])))/min(c(chromosome.stats$centromeric_index[i], 
        (1 - chromosome.stats$centromeric_index[i])))
}
hist(ratio, main = "", breaks = 30)

sum(ratio < 3, na.rm = TRUE)/sum(!is.na(ratio))
## [1] 0.92

With d(x) the genetic distance at the relative physical position x (x betwwen 0 and 1) and c the centromeric index.

  1. Haenel et al. (2018) telomere model assumes that d(1/2) = d(1) - d(1/2), hence d(1/2)/d(1) = 1/2

  2. If a centromere effect and 1 co/chromosome, d(c) = c x 50 + a x c and d(1) – d(c) = (1-c) x 50 + a x (1-c). So, d(c) / d(1) = c

  3. If a centromere effect and 1 co/chromosome arm, d(c) = 50 + a x c and d(1) – d(c) = 50 + a x (1-c). So, (d(c) -50) / (d(1) – 100) = c

For chromosomes with a centromeric index, we used the Marey map to compute the predictions of the model.

Verify the predictions.

# How many predictions are negative for one CO per chromosome arm (i.e. shorter
# arm < 50cM)?  Count:
sum((predictions_centromereModels$pred_centromere_arm < 0), na.rm = TRUE)
## [1] 94
# Proportion:
sum((predictions_centromereModels$pred_centromere_arm < 0)/nrow(predictions_centromereModels), 
    na.rm = TRUE)
## [1] 0.2211765
par(mfrow = c(1, 2))
hist(predictions_centromereModels$pred_telomere, xlab = "Predicted ratio", breaks = 30, 
    main = "")
# Distribution centered on 0.5, but too much variance than expected under the
# strict telomere model of Haenel et al. (2018)
plot(rep(0.5, nrow(predictions_centromereModels)), predictions_centromereModels$pred_telomere, 
    xlab = "Predicted", ylab = "Observed", main = "")

# The prediction of a ratio constant at 0.5 is rejected

hist(predictions_centromereModels$pred_centromere_chromosome, xlab = "Predicted ratio", 
    breaks = 30, main = "")
plot(predictions_centromereModels$centromeric_index, predictions_centromereModels$pred_centromere_chromosome, 
    xlab = "Predicted", ylab = "Observed", main = "")

hist(predictions_centromereModels$pred_centromere_arm, xlab = "Predicted ratio", 
    breaks = 30, main = "")
plot(predictions_centromereModels$centromeric_index, predictions_centromereModels$pred_centromere_arm, 
    xlab = "Predicted", ylab = "Observed", main = "")

# Manually remove predictions outside (0,1)
sum(predictions_centromereModels$pred_centromere_arm > 1 | predictions_centromereModels$pred_centromere_arm < 
    0, na.rm = TRUE)/sum(!is.na(predictions_centromereModels$pred_centromere_arm), 
    na.rm = TRUE)
## [1] 0.4882075
hist(predictions_centromereModels$pred_centromere_arm[which(predictions_centromereModels$pred_centromere_arm < 
    1 & predictions_centromereModels$pred_centromere_arm > 0)], xlab = "Predicted ratio", 
    breaks = 30, main = "")
plot(predictions_centromereModels$centromeric_index[which(predictions_centromereModels$pred_centromere_arm < 
    1 & predictions_centromereModels$pred_centromere_arm > 0)], predictions_centromereModels$pred_centromere_arm[which(predictions_centromereModels$pred_centromere_arm < 
    1 & predictions_centromereModels$pred_centromere_arm > 0)], xlab = "Predicted", 
    ylab = "Observed", main = "")

par(mfrow = c(1, 1))

Estimate the correlation expected-observed for the best model: centromere with one crossover per chromosome.

# Sample size
nrow(predictions_centromereModels)
## [1] 425
cor.test(predictions_centromereModels$centromeric_index, predictions_centromereModels$pred_centromere_chromosome, 
    method = "spearman")
## Warning in cor.test.default(predictions_centromereModels$centromeric_index, :
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  predictions_centromereModels$centromeric_index and predictions_centromereModels$pred_centromere_chromosome
## S = 3626920, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7165184

Make the figure.

## [1] 0.4539616
##      2.5% 
## 0.4338565
##     97.5% 
## 0.4730382
(ref:centromere-predictions) Correlation observed-predicted under the centromere model with one mandatory crossover.

Figure 4.7: (ref:centromere-predictions) Correlation observed-predicted under the centromere model with one mandatory crossover.

4.2.3 Model selection

Model selection based on R2 across species.

centromere_modselection = data.frame(Model = c("Telomere", "Centromere (arm)", "Centromere (chromosome)", 
    "Telomere (subset)", "Centromere (arm) (subset)", "Centromere (chromosome) (subset)", 
    "Telomere (trueCI)", "Centromere (arm) (trueCI)", "Centromere (chromosome) (trueCI)"), 
    Expected = rep(c("d(1/2) / d(1) = 0.5", "(d(c) -50) / (d(1) – 100) = c", "d(c) / d(1) = c"), 
        3), AdjustedRsquared = NA, Pvalue = NA, AIC = NA, BIC = NA, Species = NA)

predictions_centromereModels$pred_centromere_arm[which(predictions_centromereModels$pred_centromere_arm == 
    Inf)] = NA

mod = lm(pred_telomere ~ centromeric_index, data = predictions_centromereModels)
summary(mod)$r.squared
## [1] 0.216888
centromere_modselection$AdjustedRsquared[1] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[1] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[1] = AIC(mod)
centromere_modselection$BIC[1] = BIC(mod)

mod = lm(pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -127.606   -0.223    0.208    0.722   53.840 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)         0.8596     1.5918   0.540    0.589
## centromeric_index  -1.1815     3.2745  -0.361    0.718
## 
## Residual standard error: 9.299 on 422 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0003084,  Adjusted R-squared:  -0.002061 
## F-statistic: 0.1302 on 1 and 422 DF,  p-value: 0.7184
centromere_modselection$AdjustedRsquared[2] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[2] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[2] = AIC(mod)
centromere_modselection$BIC[2] = BIC(mod)

mod = lm(pred_centromere_chromosome ~ centromeric_index, data = predictions_centromereModels)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_chromosome ~ centromeric_index, 
##     data = predictions_centromereModels)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44215 -0.06600  0.00659  0.08467  0.42330 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.01549    0.02349   -0.66     0.51    
## centromeric_index  1.00752    0.04826   20.88   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1375 on 423 degrees of freedom
## Multiple R-squared:  0.5075, Adjusted R-squared:  0.5064 
## F-statistic: 435.9 on 1 and 423 DF,  p-value: < 2.2e-16
centromere_modselection$AdjustedRsquared[3] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[3] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[3] = AIC(mod)
centromere_modselection$BIC[3] = BIC(mod)

Linear regression and model selection based on R2 criterion within species with at least 5 chromosomes.

table(species$bestmodel)
## 
##  2  3 
##  7 30
centromere_modselection$Species[1] = sum(species$bestmodel == 1)
centromere_modselection$Species[2] = sum(species$bestmodel == 2)
centromere_modselection$Species[3] = sum(species$bestmodel == 3)

Redo model selection but on a subset of chromosomes with no chromosome arm below 50cM.

# Sample size: number of chromosomes:
nrow(predictions_centromereModels[which(predictions_centromereModels$pred_centromere_arm > 
    0), ])
## [1] 330
predictions_centromereModels_subset = predictions_centromereModels[which(predictions_centromereModels$pred_centromere_arm > 
    0), ]

mod = lm(pred_telomere ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)$r.squared
## [1] 0.1871755
centromere_modselection$AdjustedRsquared[4] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[4] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[4] = AIC(mod)
centromere_modselection$BIC[4] = BIC(mod)

mod = lm(pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels_subset)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.762 -1.175 -0.848 -0.369 52.357 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)         0.8381     0.9167   0.914    0.361
## centromeric_index   1.5056     1.8734   0.804    0.422
## 
## Residual standard error: 4.541 on 328 degrees of freedom
## Multiple R-squared:  0.001965,   Adjusted R-squared:  -0.001077 
## F-statistic: 0.6459 on 1 and 328 DF,  p-value: 0.4222
centromere_modselection$AdjustedRsquared[5] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[5] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[5] = AIC(mod)
centromere_modselection$BIC[5] = BIC(mod)

mod = lm(pred_centromere_chromosome ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_chromosome ~ centromeric_index, 
##     data = predictions_centromereModels_subset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43678 -0.06441  0.01070  0.07741  0.42520 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.009085   0.026649  -0.341    0.733    
## centromeric_index  0.988207   0.054461  18.145   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.132 on 328 degrees of freedom
## Multiple R-squared:  0.5009, Adjusted R-squared:  0.4994 
## F-statistic: 329.2 on 1 and 328 DF,  p-value: < 2.2e-16
centromere_modselection$AdjustedRsquared[6] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[6] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[6] = AIC(mod)
centromere_modselection$BIC[6] = BIC(mod)

Linear regression and model selection based on R2 criterion within species with at least 5 chromosomes.

table(species_subset$bestmodel)
## 
##  2  3 
## 10 26
centromere_modselection$Species[4] = sum(species_subset$bestmodel == 1)
centromere_modselection$Species[5] = sum(species_subset$bestmodel == 2)
centromere_modselection$Species[6] = sum(species_subset$bestmodel == 3)

Redo model selection but on a subset of chromosomes with a correct CI orientation assessed (remove wrong orientation) 50cM.

id = which(paste(predictions_centromereModels$set, predictions_centromereModels$chromosome) %in% 
    paste(chromosome.stats$set[idx_wrongCIorientation], chromosome.stats$chromosome[idx_wrongCIorientation]))

paste(predictions_centromereModels$set[id], predictions_centromereModels$chromosome[id])
##  [1] "Boechera_stricta_Lee2020 2"                
##  [2] "Boechera_stricta_Lee2020 5"                
##  [3] "Brassica_napus_Yang2017 A07"               
##  [4] "Brassica_rapa_CodyMarkelz2017 A01"         
##  [5] "Capsella_rubella_Slotte2013 1"             
##  [6] "Capsella_rubella_Slotte2013 4"             
##  [7] "Capsicum_annuum_Han2016 3"                 
##  [8] "Capsicum_annuum_Han2016 12"                
##  [9] "Cenchrus_americanus_Pucher2017 2"          
## [10] "Cenchrus_americanus_Pucher2017 6"          
## [11] "Citrullus_lanatus_Ren2015 3"               
## [12] "Citrullus_lanatus_Ren2015 7"               
## [13] "Citrullus_lanatus_Ren2015 10"              
## [14] "Citrus_sinensis_Huang2018 4"               
## [15] "Coffea_canephora_Crouzillat2020 A"         
## [16] "Coffea_canephora_Crouzillat2020 K"         
## [17] "Cucumis_melo_Pereira2018 4"                
## [18] "Cucumis_melo_Pereira2018 7"                
## [19] "Cucumis_melo_Pereira2018 9"                
## [20] "Cucumis_sativus_Zhu2016 6"                 
## [21] "Cucumis_sativus_Zhu2016 5"                 
## [22] "Elaeis_guineensis_Yaakub2020 5"            
## [23] "Elaeis_guineensis_Yaakub2020 1"            
## [24] "Elaeis_guineensis_Yaakub2020 12"           
## [25] "Eucalyptus_grandis_Bertholome2015 1"       
## [26] "Eucalyptus_grandis_Bertholome2015 6"       
## [27] "Eucalyptus_grandis_Bertholome2015 7"       
## [28] "Eucalyptus_grandis_Bertholome2015 8"       
## [29] "Gossypium_hirsutum_Zhang2019 A11"          
## [30] "Lupinus_angustifolius_Zhou2017 8"          
## [31] "Lupinus_angustifolius_Zhou2017 10"         
## [32] "Lupinus_angustifolius_Zhou2017 19"         
## [33] "Lupinus_angustifolius_Zhou2017 15"         
## [34] "Lupinus_angustifolius_Zhou2017 3"          
## [35] "Lupinus_angustifolius_Zhou2017 7"          
## [36] "Malus_domestica_DiPierro2016 3"            
## [37] "Manihot_esculenta_ICGMC2015 4"             
## [38] "Manihot_esculenta_ICGMC2015 6"             
## [39] "Manihot_esculenta_ICGMC2015 8"             
## [40] "Manihot_esculenta_ICGMC2015 12"            
## [41] "Momordica_charantia_Mastumura2020 2"       
## [42] "Nelumbo_nucifera_Gui2018 1"                
## [43] "Nelumbo_nucifera_Gui2018 5"                
## [44] "Oryza_nivara_Ma2016 9"                     
## [45] "Oryza_nivara_Ma2016 12"                    
## [46] "Oryza_sativa_DeLeon2016 3"                 
## [47] "Oryza_sativa_DeLeon2016 5"                 
## [48] "Oryza_sativa_DeLeon2016 6"                 
## [49] "Phaseolus_vulgaris_Song2015 4"             
## [50] "Phaseolus_vulgaris_Song2015 10"            
## [51] "Phaseolus_vulgaris_Song2015 11"            
## [52] "Prunus_persica_Verde2017 3"                
## [53] "Sesamum_indicum_Wang2016 1"                
## [54] "Sesamum_indicum_Wang2016 3"                
## [55] "Sesamum_indicum_Wang2016 4"                
## [56] "Solanum_lycopersicum_Gonda2018 12"         
## [57] "Solanum_tuberosum_Endelman2016 1"          
## [58] "Solanum_tuberosum_Endelman2016 4"          
## [59] "Sorghum_bicolor_Zou2012 6"                 
## [60] "Sorghum_bicolor_Zou2012 7"                 
## [61] "Sorghum_bicolor_Zou2012 10"                
## [62] "Theobroma_cacao_Royaert2016 2"             
## [63] "Theobroma_cacao_Royaert2016 6"             
## [64] "Theobroma_cacao_Royaert2016 7"             
## [65] "Triticum_aestivum_GutierrezGonzalez2019 4B"
## [66] "Triticum_aestivum_GutierrezGonzalez2019 4D"
## [67] "Vitis_vinifera_Brault2020 4"               
## [68] "Vitis_vinifera_Brault2020 6"               
## [69] "Vitis_vinifera_Brault2020 9"               
## [70] "Vitis_vinifera_Brault2020 12"
# Sample size: number of chromosomes:
nrow(predictions_centromereModels[-id, ])
## [1] 355
predictions_centromereModels_subset = predictions_centromereModels[-id, ]

mod = lm(pred_telomere ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)$r.squared
## [1] 0.3203657
centromere_modselection$AdjustedRsquared[7] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[7] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[7] = AIC(mod)
centromere_modselection$BIC[7] = BIC(mod)

mod = lm(pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_arm ~ centromeric_index, data = predictions_centromereModels_subset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -127.484   -0.178    0.256    0.749   53.928 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)          0.907      1.915   0.474    0.636
## centromeric_index   -1.424      3.993  -0.357    0.722
## 
## Residual standard error: 10.16 on 352 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.000361,   Adjusted R-squared:  -0.002479 
## F-statistic: 0.1271 on 1 and 352 DF,  p-value: 0.7217
centromere_modselection$AdjustedRsquared[8] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[8] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[8] = AIC(mod)
centromere_modselection$BIC[8] = BIC(mod)

mod = lm(pred_centromere_chromosome ~ centromeric_index, data = predictions_centromereModels_subset)
summary(mod)
## 
## Call:
## lm(formula = pred_centromere_chromosome ~ centromeric_index, 
##     data = predictions_centromereModels_subset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41421 -0.06353  0.00445  0.08306  0.38753 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.05477    0.02459  -2.227   0.0266 *  
## centromeric_index  1.10408    0.05120  21.563   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1308 on 353 degrees of freedom
## Multiple R-squared:  0.5684, Adjusted R-squared:  0.5672 
## F-statistic:   465 on 1 and 353 DF,  p-value: < 2.2e-16
centromere_modselection$AdjustedRsquared[9] = summary(mod)$adj.r.squared
centromere_modselection$Pvalue[9] = summary(mod)$coefficients[2, 4]
centromere_modselection$AIC[9] = AIC(mod)
centromere_modselection$BIC[9] = BIC(mod)

Linear regression and model selection based on R2 criterion within species with at least 5 chromosomes.

table(species_subset$bestmodel)
## 
##  2  3 
##  7 30
centromere_modselection$Species[7] = sum(species_subset$bestmodel == 1)
centromere_modselection$Species[8] = sum(species_subset$bestmodel == 2)
centromere_modselection$Species[9] = sum(species_subset$bestmodel == 3)

Save model selection…

5 Genomic landscapes correlate to recombination landscapes

# Sample size - Species
sum(annotated$annotation)
## [1] 41
# Sample size - Chromosomes
length(unique(data$map))
## [1] 483
cat("Species with annotations that were integrated in the dataset:\n")
## Species with annotations that were integrated in the dataset:
paste(annotated$species[annotated$annotation], annotated$accession[annotated$annotation])
##  [1] "Arabidopsis_thaliana GCA_000001735.2"            
##  [2] "Arachis_duranensis GCF_000817695.2"              
##  [3] "Arachis_hypogaea GCA_003713155.1"                
##  [4] "Brachypodium_distachyon GCA_000005505.4"         
##  [5] "Brassica_napus GCF_000686985.2"                  
##  [6] "Brassica_rapa GCF_000309985.1"                   
##  [7] "Camelina_sativa GCF_000633955.1"                 
##  [8] "Camellia_sinensis GCA_013676235.1"               
##  [9] "Capsella_rubella GCA_000375325.1"                
## [10] "Citrus_sinensis GCA_000317415.1"                 
## [11] "Cucumis_melo v4.0"                               
## [12] "Cucumis_sativus GCA_000004075.2"                 
## [13] "Cucurbita_maxima Cmaxima_v1.1"                   
## [14] "Cucurbita_pepo GCF_002806865.2"                  
## [15] "Dioscorea_alata GCA_002240015.2"                 
## [16] "Elaeis_guineensis GCF_000442705.1"               
## [17] "Eucalyptus_grandis Egrandis_297_v2.0"            
## [18] "Glycine_max GCF_000004515.4"                     
## [19] "Gossypium_hirsutum HAU_G.hirsutum_AD1genome_v1.1"
## [20] "Gossypium_raimondii GCA_000327365.1"             
## [21] "Helianthus_annuus GCA_002127325.1"               
## [22] "Hordeum_vulgare GCA_900075435.2"                 
## [23] "Lupinus_albus GCA_009771035.1"                   
## [24] "Malus_domestica GCA_004115385.1"                 
## [25] "Manihot_esculenta GCA_001659605.1"               
## [26] "Oryza_nivara GCA_000576065.1"                    
## [27] "Oryza_sativa GCA_001433935.1"                    
## [28] "Panicum_hallii GCA_002211085.2"                  
## [29] "Phaseolus_vulgaris GCA_001517995.1"              
## [30] "Prunus_mume GCF_000346735.1"                     
## [31] "Prunus_persica GCA_000346465.2"                  
## [32] "Sesamum_indicum GCF_000512975.1"                 
## [33] "Setaria_italica GCA_000263155.2"                 
## [34] "Solanum_lycopersicum GCA_000188115.3"            
## [35] "Solanum_tuberosum PGSC_DM_v4.03"                 
## [36] "Sorghum_bicolor GCA_000003195.3"                 
## [37] "Theobroma_cacao GCA_000403535.1"                 
## [38] "Triticum_aestivum GCA_900519105.1"               
## [39] "Vigna_unguiculata GCF_004118075.1"               
## [40] "Vitis_vinifera GCA_000003745.2"                  
## [41] "Zea_mays GCA_000005005.6"
cat("Species without annotations:\n")
## Species without annotations:
paste(annotated$species[!(annotated$annotation)], annotated$accession[!(annotated$annotation)])
##  [1] "Aegilops_speltoides GCA_000347335.2"  
##  [2] "Aegilops_speltoides GCA_000347335.2"  
##  [3] "Boechera_stricta NA"                  
##  [4] "Capsicum_annuum GCA_002878395.2"      
##  [5] "Cenchrus_americanus GCA_002174835.2"  
##  [6] "Citrullus_lanatus GCA_000238415.2"    
##  [7] "Coffea_canephora NA"                  
##  [8] "Draba_nivalis PRJNA657155"            
##  [9] "Juglans_regia NA"                     
## [10] "Lupinus_angustifolius GCA_002285895.2"
## [11] "Mangifera_indica GCA_011075055.1"     
## [12] "Momordica_charantia GCA_013281855.1"  
## [13] "Nelumbo_nucifera GCA_003033695.1"     
## [14] "Quercus_sp GCA_900291515.1"           
## [15] "Raphanus_sativus GCA_002197605.1"     
## [16] "Triticum_dicoccoides GCA_002575655.1" 
## [17] "Triticum_urartu GCA_003073215.1"

Most species had rarely more than 20 genes in a single window and variance of the averaged recombination rate increased in the upper range of gene counts (we also observed an inflection of the variance around 20 genes in raw data), hence for gene count the quadratic regression was fitted to a subset of windows with at most 20 genes. We considered that windows with more genes were less reliable than the rest of the genome (e.g. too much annotations, overlapping genes).

hist(data$rec.rate, breaks = 100, main = "", xlab = "Recombination rate", xlim = c(0, 
    20))

hist(data$gene_count, breaks = 100, main = "", xlab = "Gene count", xlim = c(0, 50))

cat("Proportion of windows with a null recombination rate (i.e. < 0.001):\n")
## Proportion of windows with a null recombination rate (i.e. < 0.001):
sum(data$rec.rate < 0.001, na.rm = TRUE)/sum(!is.na(data$rec.rate), na.rm = TRUE)
## [1] 0.1259402
ggplot(data = meanrecrate_genecount, aes(x = gene_count, group = species)) + geom_density(alpha = 0.4) + 
    labs(x = "Gene count") + geom_vline(aes(xintercept = 30)) + theme(axis.line = element_line(), 
    panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), 
    panel.background = element_blank(), plot.title = element_text(color = "black", 
        size = 16, face = "bold.italic", hjust = 0.5), plot.subtitle = element_text(color = "black", 
        size = 16, hjust = 0.5), axis.title.x = element_text(color = "black", size = 16), 
    axis.title.y = element_text(color = "black", size = 16), axis.text = element_text(size = 16, 
        colour = "black"), strip.text = element_text(size = 12, colour = "black", 
        angle = 90), legend.position = "none", legend.key = element_rect(fill = "white", 
        size = 1), legend.key.width = unit(2, "cm"), legend.text = element_text(size = 14), 
    legend.title = element_text(size = 16))

5.1 Within species, chromosome-level

At a chromosome level, we expect that windows with more genes will exhibit higher recombination rates. We tested this prediction with a Spearman correlation on 100kb windows for each chromosome. Furthermore, I tested the significance of the correlation for each chromosome and the significance of the relationship among species with a Linear Mixed Model (not shown here) taking into account repeated measurement within species and the structure of the dataset (species and chromosomes were random effects).

## [1] 0.4619748
##      2.5% 
## 0.4364954
##     97.5% 
## 0.4862635

cat("Number of chromosomes analyzed:\n")
## Number of chromosomes analyzed:
sum(!is.na(cor_genecount$correlation))
## [1] 447
cat("Proportion of chromosomes with a significant Spearman correlation:\n")
## Proportion of chromosomes with a significant Spearman correlation:
sum(cor_genecount$signif, na.rm = TRUE)/sum(!is.na(cor_genecount$signif))
## [1] 0.9082774

5.2 Across species: Chromosomes pooled per species.

Is it a common pattern among species?

In order to assess the interspecific differences in the relationship between recombination rates and gene count, I pooled chromosomes per species.

colnames(spcorr)[4] = "n"
kable(spcorr, digits = 2, align = "c", caption = "Species Spearman's rank correlation between recombination rates and gene number.")
Table 5.1: Species Spearman’s rank correlation between recombination rates and gene number.
species correlation signif n phys.map.length.y mean.recrate
Arabidopsis thaliana 0.28 1.00 5 23.83 4.08
Arachis duranensis 0.53 0.88 8 100.95 1.02
Arachis hypogaea 0.63 1.00 18 124.68 1.53
Brachypodium distachyon 0.62 1.00 3 45.62 6.05
Brassica napus 0.21 1.00 12 44.72 3.81
Brassica rapa 0.34 1.00 7 26.14 4.59
Camelina sativa 0.54 1.00 15 30.63 3.64
Camellia sinensis 0.12 0.89 5 222.11 0.61
Capsella rubella -0.07 0.57 7 15.84 3.15
Citrus sinensis 0.19 0.88 8 26.02 2.83
Cucumis melo 0.58 1.00 11 30.99 4.27
Cucumis sativus 0.19 0.71 7 27.41 5.69
Cucurbita maxima 0.27 0.58 19 10.35 17.57
Cucurbita pepo 0.27 0.50 17 10.79 13.53
Dioscorea alata 0.11 0.67 14 23.85 5.28
Elaeis guineensis 0.14 0.83 11 46.94 2.65
Eucalyptus grandis 0.41 1.00 11 55.69 1.40
Glycine max 0.74 1.00 19 48.13 3.11
Gossypium hirsutum 0.54 1.00 26 85.92 2.45
Gossypium raimondii 0.62 1.00 13 57.63 3.00
Helianthus annuus 0.13 1.00 13 169.70 0.47
Hordeum vulgare 0.40 1.00 7 654.86 0.24
Lupinus albus 0.63 1.00 25 17.36 7.00
Malus domestica 0.64 1.00 17 38.38 2.00
Manihot esculenta 0.11 0.61 18 28.80 6.12
Oryza nivara 0.24 0.86 12 28.16 2.81
Oryza sativa 0.51 1.00 12 31.10 4.42
Panicum hallii 0.70 1.00 9 56.38 2.05
Phaseolus vulgaris 0.76 1.00 11 46.80 2.00
Prunus mume 0.36 0.86 7 24.99 7.26
Prunus persica 0.71 1.00 8 28.21 2.85
Sesamum indicum 0.10 0.60 13 16.41 4.39
Setaria italica 0.67 1.00 9 44.59 3.61
Solanum lycopersicum 0.68 1.00 12 67.27 1.94
Solanum tuberosum 0.52 1.00 12 60.42 1.27
Sorghum bicolor 0.71 1.00 10 68.36 2.44
Theobroma cacao 0.83 1.00 10 33.05 2.60
Triticum aestivum 0.32 1.00 20 667.83 0.21
Vigna unguiculata 0.72 1.00 11 43.04 1.78
Vitis vinifera 0.57 1.00 19 22.43 2.79
Zea mays 0.47 1.00 10 210.63 0.87
hist(spcorr$correlation, breaks = 30, xlab = "Species Spearman correlation", main = "", 
    xlim = c(-1, 1))
abline(v = 0, col = "Red", lty = 2)

# df = spcorr[,1:4] colnames(df) = c('Species', 'Mean correlation', 'Proportion
# of significant correlations', 'Number of chromosomes') write.xlsx(x = df, file
# = paste(wd, 'tables/article_one_supp/tables_supplementary.xls', sep = ''),
# sheetName = 'S10_correlationRecombinationGenes', row.names = FALSE, append =
# TRUE)

Correlations seem uniformly distributed among species from zero to one. There is only few cases of negative correlation. The correlation previously inferred seems robust, yet the strength of the relationship vary greatly among species.

All chromosomes pooled, the mean recombination rate per gene count and confidence intervals at 95% were estimated for each species (1,000 bootstraps).

(ref:genecount-allspecies) Recombination rate (cM/Mb) depends on the gene count for 44 species (chromosomes pooled per species). Mean recombination rates and confidence intervals (95%) estimated by 1,000 bootstraps. Gene count was made by counting the number of gene starting positions within the window.

Figure 5.1: (ref:genecount-allspecies) Recombination rate (cM/Mb) depends on the gene count for 44 species (chromosomes pooled per species). Mean recombination rates and confidence intervals (95%) estimated by 1,000 bootstraps. Gene count was made by counting the number of gene starting positions within the window.

(ref:genecount-allspecies) Recombination rate (cM/Mb) depends on the gene count for 44 species (chromosomes pooled per species). Mean recombination rates and confidence intervals (95%) estimated by 1,000 bootstraps. Gene count was made by counting the number of gene starting positions within the window.

Figure 5.2: (ref:genecount-allspecies) Recombination rate (cM/Mb) depends on the gene count for 44 species (chromosomes pooled per species). Mean recombination rates and confidence intervals (95%) estimated by 1,000 bootstraps. Gene count was made by counting the number of gene starting positions within the window.

We see that the strength of the pattern varies among species. Intuitively, we can think that the slope of the relationship may depend on chromosome size, as many other patterns already observed.

# cor_genecount = merge(cor_genecount, chromosome.stats[,c(1, 3, 8)]) plot(x =
# cor_genecount$phys.map.length, y = cor_genecount$correlation, log = 'x')

# Estimate the slope of the relationship for each species Data = 100kb windows

# Estimate the slope of the relationship for each species Data = pooled
# chromosomes, recombination rate per gene count
spcorr$intercept = NA
spcorr$slope = NA
for (i in 1:nrow(spcorr)) {
    subset = meanrecrate_genecount[which(gsub("_", " ", meanrecrate_genecount$species) == 
        spcorr$species[i]), ]
    lm_model = lm(mean_rec ~ gene_count, data = subset)
    spcorr$intercept[i] = lm_model$coefficients[1]
    spcorr$slope[i] = lm_model$coefficients[2]
}

# predictions not verified None effect of chromosome length
plot(x = spcorr$phys.map.length, y = spcorr$slope, log = "x")

cor.test(x = spcorr$phys.map.length, y = spcorr$slope, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  spcorr$phys.map.length and spcorr$slope
## S = 11346, p-value = 0.9424
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.01167247
# And mean recombination rate?
plot(x = spcorr$mean.recrate, y = spcorr$slope, log = "x", xlab = "Mean recombination rate", 
    ylab = "Slope")

# Significance of teh relatiionship
cor.test(x = spcorr$mean.recrate, y = spcorr$slope, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  spcorr$mean.recrate and spcorr$slope
## S = 10572, p-value = 0.6219
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.07909408
summary(lm(slope ~ mean.recrate, data = spcorr))
## 
## Call:
## lm(formula = slope ~ mean.recrate, data = spcorr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25164 -0.08751 -0.03233  0.05596  0.41274 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.099649   0.032511   3.065  0.00394 **
## mean.recrate 0.001453   0.006660   0.218  0.82839   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1385 on 39 degrees of freedom
## Multiple R-squared:  0.00122,    Adjusted R-squared:  -0.02439 
## F-statistic: 0.04762 on 1 and 39 DF,  p-value: 0.8284

5.3 Is it a common pattern among species?

summary(lm(mean_rec ~ gene_count, data = meanrecrate_genecount_subset))
## 
## Call:
## lm(formula = mean_rec ~ gene_count, data = meanrecrate_genecount_subset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4534 -0.3457  0.0101  0.3602  3.1616 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.219824   0.042469  -28.72   <2e-16 ***
## gene_count   0.124219   0.003695   33.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6328 on 820 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.5795, Adjusted R-squared:  0.579 
## F-statistic:  1130 on 1 and 820 DF,  p-value: < 2.2e-16

Quadratic regression.

summary(lm(mean_rec ~ gene_count + I(gene_count^2), data = meanrecrate_genecount_subset))
## 
## Call:
## lm(formula = mean_rec ~ gene_count + I(gene_count^2), data = meanrecrate_genecount_subset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1784 -0.3209 -0.0092  0.3239  3.5239 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -1.5821115  0.0572722 -27.624   <2e-16 ***
## gene_count       0.2394674  0.0133397  17.952   <2e-16 ***
## I(gene_count^2) -0.0058240  0.0006501  -8.959   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6043 on 819 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.6171, Adjusted R-squared:  0.6161 
## F-statistic: 659.9 on 2 and 819 DF,  p-value: < 2.2e-16
(ref:genecount-quadraticregression) Recombination rate depends on the gene count for 44 species both across and within species (A) Mean recombination rates (chromosomes pooled per species, one color per species) estimated by 1,000 bootstraps and standardized within species. Gene count was made by counting the number of gene starting positions within 100kb windows with a recombination rate. Black line with grey ribbon is the quadratic regression estimated by linear regression. (B) Distribution of chromosome Spearman rank correlations between the number of genes and the recombination rate in 100kb windows (n = 488 chromosomes). The black vertical lines are the mean correlation across chromosomes (solid) and 95% confidence interval (dashed) estimated by 1,000 bootstraps. (C) Slope of the species linear regression between gene count and recombination rates. Linear regression estimated on chromosome-level data with recombination rates estimated in quantiles of gene count. The linear regression is not significant (p = 0.72).

Figure 5.3: (ref:genecount-quadraticregression) Recombination rate depends on the gene count for 44 species both across and within species (A) Mean recombination rates (chromosomes pooled per species, one color per species) estimated by 1,000 bootstraps and standardized within species. Gene count was made by counting the number of gene starting positions within 100kb windows with a recombination rate. Black line with grey ribbon is the quadratic regression estimated by linear regression. (B) Distribution of chromosome Spearman rank correlations between the number of genes and the recombination rate in 100kb windows (n = 488 chromosomes). The black vertical lines are the mean correlation across chromosomes (solid) and 95% confidence interval (dashed) estimated by 1,000 bootstraps. (C) Slope of the species linear regression between gene count and recombination rates. Linear regression estimated on chromosome-level data with recombination rates estimated in quantiles of gene count. The linear regression is not significant (p = 0.72).

5.3.0.1 Figure 8

Though the relationship seems a common feature among species, some of them are outliers and show a stronger relationship than the rest of species. It would be interesting to look at specific relationships to identify outlying patterns.

5.4 Gene landscapes correlate to recombination landscape

Plot gene landscapes in the same manner as recombination landscapes in (ref:DistancesRelative-pooledChromosomes-patterns) Recombination rates are higher in distal regions and lower near the center of the chromosome. Standardized recombination rates (cM/Mb) as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Two patterns were identified and species were pooled accordingly. Recombination rates decrease immediately from the tip of the chromosome for 35 species (dark grey line and ribbon) or recombination rate is reduced in telomeric regions and the peak of recombination is in a sub-telomeric region for 16 species (light grey line and ribbon). The solid line represents the mean recombination rate estimated in a bin and upper and lower boundaries of the ribbon represent the maximum and minimum values for a particular pattern. Besides, patterns that were not classified (6 species) were represented by loess regression in grey dashed lines. For estimating recombination rates in bins of relative distance, chromosomes were split in halves, a distance of 0.5 being the center of the chromosome. Then, chromosomes were pooled per species..

Number of genes ~ distance to the telomere.

Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. Linear quadratic regression estimated is the black line (95% confidence interval in dark grey).

Figure 5.4: Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. Linear quadratic regression estimated is the black line (95% confidence interval in dark grey).

Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. A loess regression was estimated for each species.

Figure 5.5: Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Chromosomes were split in halves, distance 0.5 being the center of the chromosome. Then, chromosomes were pooled per species. Each color is a species. A loess regression was estimated for each species.

(ref:DistancesRelative-pooledChromosomes-genecount)

Figure 5.6: (ref:DistancesRelative-pooledChromosomes-genecount)

5.4.0.1 Figure S7

Classify gene patterns as we did for crossover patterns.

table(telomere_pattern$gene_pattern)
## 
## exception      peak  telomere 
##         5         7        29
table(telomere_pattern$pattern)
## 
## exception      peak  telomere 
##         7        16        34
# Species that changed of pattern among recombination/genes
telomere_pattern$species[which(telomere_pattern$pattern != telomere_pattern$gene_pattern)]
##  [1] "Brassica napus"     "Dioscorea alata"    "Eucalyptus grandis"
##  [4] "Malus domestica"    "Mangifera indica"   "Manihot esculenta" 
##  [7] "Panicum hallii"     "Prunus persica"     "Sesamum indicum"   
## [10] "Sorghum bicolor"    "Vitis vinifera"
knitr::kable(telomere_pattern, label = c("Species", "Recombination pattern", "Gene density pattern"), 
    caption = "genelandscapes-patterns", align = "c")
(#tab:SpeciesRecombination patternGene density pattern)genelandscapes-patterns
species pattern gene_pattern
Aegilops speltoides telomere NA
Arabidopsis thaliana exception exception
Arachis duranensis telomere telomere
Arachis hypogaea telomere telomere
Boechera stricta peak NA
Brachypodium distachyon telomere telomere
Brassica napus telomere exception
Brassica rapa peak peak
Camelina sativa telomere telomere
Camellia sinensis exception exception
Capsella rubella exception exception
Capsicum annuum telomere NA
Cenchrus americanus telomere NA
Citrullus lanatus telomere NA
Citrus sinensis telomere NA
Coffea canephora telomere NA
Cucumis melo telomere telomere
Cucumis sativus peak peak
Cucurbita maxima peak peak
Cucurbita pepo peak peak
Dioscorea alata telomere peak
Draba nivalis peak NA
Elaeis guineensis peak peak
Eucalyptus grandis exception telomere
Glycine max telomere telomere
Gossypium hirsutum telomere telomere
Gossypium raimondii telomere telomere
Helianthus annuus telomere telomere
Hordeum vulgare telomere telomere
Juglans regia peak NA
Lupinus albus telomere telomere
Lupinus angustifolius telomere NA
Malus domestica peak telomere
Mangifera indica peak telomere
Manihot esculenta peak telomere
Momordica charantia peak NA
Nelumbo nucifera exception NA
Oryza nivara telomere telomere
Oryza sativa telomere telomere
Panicum hallii peak telomere
Phaseolus vulgaris telomere telomere
Prunus mume telomere telomere
Prunus persica peak telomere
Quercus sp exception NA
Raphanus sativus exception NA
Sesamum indicum peak exception
Setaria italica telomere telomere
Solanum lycopersicum telomere telomere
Solanum tuberosum telomere telomere
Sorghum bicolor peak telomere
Theobroma cacao telomere telomere
Triticum aestivum telomere telomere
Triticum dicoccoides telomere NA
Triticum urartu telomere NA
Vigna unguiculata telomere telomere
Vitis vinifera telomere peak
Zea mays telomere telomere
(ref:DistancesRelative-pooledChromosomes-patterns-genecount) Gene counts are higher in distal regions and lower near the center of the chromosome. Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Two patterns were identified based on the distribution of recombination rates along the telomere-centromere axis and species were pooled accordingly. Recombination rates decrease immediately from the tip of the chromosome for 35 species (dark grey line and ribbon) or recombination rate is reduced in telomeric regions and the peak of recombination is in a sub-telomeric region for 16 species (light grey line and ribbon). The solid line represents the mean gene count estimated in a bin and upper and lower boundaries of the ribbon represent the maximum and minimum values for a particular pattern. Besides, patterns that were not classified (6 species) were represented by loess regression in grey dashed lines. For estimating gene counts in bins of relative distance, chromosomes were split in halves, a distance of 0.5 being the center of the chromosome. Then, chromosomes were pooled per species.

Figure 5.7: (ref:DistancesRelative-pooledChromosomes-patterns-genecount) Gene counts are higher in distal regions and lower near the center of the chromosome. Standardized gene count as a function of the relative distance from the telomere along the chromosome (physical distances expressed in 20 bins). Two patterns were identified based on the distribution of recombination rates along the telomere-centromere axis and species were pooled accordingly. Recombination rates decrease immediately from the tip of the chromosome for 35 species (dark grey line and ribbon) or recombination rate is reduced in telomeric regions and the peak of recombination is in a sub-telomeric region for 16 species (light grey line and ribbon). The solid line represents the mean gene count estimated in a bin and upper and lower boundaries of the ribbon represent the maximum and minimum values for a particular pattern. Besides, patterns that were not classified (6 species) were represented by loess regression in grey dashed lines. For estimating gene counts in bins of relative distance, chromosomes were split in halves, a distance of 0.5 being the center of the chromosome. Then, chromosomes were pooled per species.

5.4.0.2 Figure 9

6 Genetic shuffling

6.1 Correlations with genetic shuffling rates

# Sample size
length(!is.na(chromosome.stats.shuffling$geneticshuffling_genedist))
hist(sqrt(chromosome.stats.shuffling$peripherybias_ratio), main = "", xlab = "Periphery-bias ratio (square root)", 
    breaks = 40)

hist(chromosome.stats.shuffling$geneticshuffling, main = "", xlab = "Genetic shuffling", 
    breaks = 40)

hist(chromosome.stats.shuffling$geneticshuffling_genedist, main = "", xlab = "Genetic shuffling", 
    breaks = 40)

6.1.1 Chromosomal level

Hypothesis: Crossovers clustered in distal regions are less efficient than crossovers evenly distributed in the chromosome.

lmer.model = lmer(geneticshuffling ~ (peripherybias_ratio) + (1 | species), data = chromosome.stats.shuffling)
summary(lmer.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: geneticshuffling ~ (peripherybias_ratio) + (1 | species)
##    Data: chromosome.stats.shuffling
## 
## REML criterion at convergence: -1167.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1204 -0.5178 -0.0442  0.4157  6.8858 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 0.005632 0.07505 
##  Residual             0.002656 0.05153 
## Number of obs: 421, groups:  species, 40
## 
## Fixed effects:
##                       Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)           0.255838   0.013177  49.996897  19.416  < 2e-16 ***
## peripherybias_ratio  -0.013249   0.002194 408.850703  -6.038  3.5e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## prphrybs_rt -0.362
r.squaredGLMM(lmer.model)
##             R2m       R2c
## [1,] 0.05600436 0.6975317
# Confidence interval of the coefficients was computed by bootstrap Parametric
# bootstrap
lmer.confint = confint(lmer.model, level = 0.95, method = "boot", nsim = 1000, boot.type = "norm")
## Computing bootstrap confidence intervals ...
# Quadratic function and C.I.
lmerfun = function(x, c) {
    fixef(lmer.model)[[1]] + fixef(lmer.model)[[2]] * x
}
lowerlmerfun = function(x, c) {
    lmer.confint[[3]] + lmer.confint[[4]] * x
}
upperlmerfun = function(x, c) {
    lmer.confint[[7]] + lmer.confint[[8]] * x
}
# The C.I. values
df.new = data.frame(peripherybias_ratio = (chromosome.stats.shuffling$peripherybias_ratio))
df.new$lwr.pred = lowerlmerfun((chromosome.stats.shuffling$peripherybias_ratio))
df.new$upr.pred = upperlmerfun((chromosome.stats.shuffling$peripherybias_ratio))
(ref:shuffling-peripherybias)

Figure 6.1: (ref:shuffling-peripherybias)

6.1.1.1 Figure S9

As expected intuitively, larger genetic maps have a higher Genetic shuffling.

lmer.model = lmer(geneticshuffling ~ linkage.map.length.correctedHW + (1 | species), 
    data = chromosome.stats.shuffling)
summary(lmer.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: geneticshuffling ~ linkage.map.length.correctedHW + (1 | species)
##    Data: chromosome.stats.shuffling
## 
## REML criterion at convergence: -1488.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0971 -0.4578 -0.0038  0.3746  7.0112 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 0.004415 0.06644 
##  Residual             0.001260 0.03550 
## Number of obs: 430, groups:  species, 40
## 
## Fixed effects:
##                                 Estimate Std. Error        df t value Pr(>|t|)
## (Intercept)                    5.443e-02  1.319e-02 8.042e+01   4.127 8.93e-05
## linkage.map.length.correctedHW 1.393e-03  6.192e-05 4.238e+02  22.489  < 2e-16
##                                   
## (Intercept)                    ***
## linkage.map.length.correctedHW ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## lnkg.mp..HW -0.582
r.squaredGLMM(lmer.model)
##            R2m       R2c
## [1,] 0.4294228 0.8732853
(ref:shuffling-linkagemaplength)

Figure 6.2: (ref:shuffling-linkagemaplength)

6.1.1.2 Figure S8

6.1.2 Species level

We need information about chromosome relative sizes and all chromosomes to compute the chromosomal component Genetic shuffling for a species.

6.2 Gene distances instead of physical distances in Mb

6.2.0.1 Figure S10

Save gene densities for information in a Supplementary pdf…

6.2.0.2 Figure S11

Save Marey maps with gene distances in a Supplementary pdf…

Some examples of well chosen recombination landscapes in physical/gene distance.

(ref:marey-genedistances landscape_capsella, landscape_arabidopsis, landscape_malus, landscape_eucalyptus, landscape_oryza, landscape_zea)

Figure 6.3: (ref:marey-genedistances landscape_capsella, landscape_arabidopsis, landscape_malus, landscape_eucalyptus, landscape_oryza, landscape_zea)

6.2.0.3 Figure 10

Genetic shuffling should be (1) higher and (2) more homogeneous when estimated in gene distances rather than physical distances.

# Sample size
nrow(chromosome.stats.shuffling)
## [1] 430
length(unique(chromosome.stats.shuffling$species))
## [1] 40
summary(chromosome.stats.shuffling$geneticshuffling)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0323  0.1602  0.2095  0.2200  0.2676  0.7157
summary(chromosome.stats.shuffling$geneticshuffling_genedist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1021  0.2055  0.2509  0.2655  0.2961  0.9186
# The strength of the difference in genetic shuffling rate between bp and gene
# distances
mean(chromosome.stats.shuffling$geneticshuffling_genedist, na.rm = TRUE) - mean(chromosome.stats.shuffling$geneticshuffling, 
    na.rm = TRUE)
## [1] 0.04546639
(mean(chromosome.stats.shuffling$geneticshuffling_genedist, na.rm = TRUE)/mean(chromosome.stats.shuffling$geneticshuffling, 
    na.rm = TRUE)) - 1
## [1] 0.2066655
# ggplot(data = chromosome.stats.shuffling, aes(x = geneticshuffling)) +
# geom_histogram(color = 'Red', alpha = 0.5, bins = 50) + geom_histogram(aes(x =
# geneticshuffling_genedist), color = 'Blue', alpha = 0.5, bins = 50)

ggplot(data = chromosome.stats.shuffling, aes(x = geneticshuffling)) + geom_density(color = "Red", 
    alpha = 0.5) + geom_density(aes(x = geneticshuffling_genedist), color = "Blue", 
    alpha = 0.5)

boxplot(chromosome.stats.shuffling$geneticshuffling, chromosome.stats.shuffling$geneticshuffling_genedist, 
    names = c("Physical distances", "Gene distances"))

plot(chromosome.stats.shuffling$geneticshuffling, chromosome.stats.shuffling$geneticshuffling_genedist, 
    xlab = "Genetic shuffling", ylab = "Genetic shuffling in gene distances")
abline(0, 1)

(ref:dist-shufflingrates)

Figure 6.4: (ref:dist-shufflingrates)

I made the difference in genetic shuffling rate when changing genomic distances for gene distances (i.e. geneticshuffling_genedist - geneticshuffling). A positive value indicates that considering gene distances increases the genetic shuffling rate.

# Permutation test of the significance of the difference in genetic shuffling
# rate
nboot = 1000
boot = numeric(nboot)
set.seed(42)
for (i in 1:nboot) {
    boot[i] = mean(sample((chromosome.stats.shuffling$geneticshuffling_genedist - 
        chromosome.stats.shuffling$geneticshuffling), replace = TRUE), na.rm = TRUE)
}
(mdiff = mean(boot))  # The mean difference in genetic shuffling rate (phys - gene distance)
## [1] 0.04540133
# i.e. a value under 0 means a higher genetic shuffling rate when considering
# gene distances Is it significant at 0.05
pthreshold = 0.001
(mquant = quantile(boot, c(pthreshold/2, 1 - pthreshold/2)))
##      0.05%     99.95% 
## 0.03843896 0.05311997
# The difference is small (though the original shuffling rate is not a large
# value) But the difference is highly significant
## 
## Call:
## lm(formula = diff ~ chrsize, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.07294 -0.02894 -0.01314  0.01741  0.15455 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.274e-02  8.936e-03   3.663 0.000756 ***
## chrsize     1.792e-04  5.424e-05   3.303 0.002089 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04847 on 38 degrees of freedom
## Multiple R-squared:  0.2231, Adjusted R-squared:  0.2026 
## F-statistic: 10.91 on 1 and 38 DF,  p-value: 0.002089
(ref:differences-shufflingrates)

Figure 6.5: (ref:differences-shufflingrates)

wilcox.test(chromosome.stats.shuffling$geneticshuffling, chromosome.stats.shuffling$geneticshuffling_genedist)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  chromosome.stats.shuffling$geneticshuffling and chromosome.stats.shuffling$geneticshuffling_genedist
## W = 64418, p-value = 1.404e-14
## alternative hypothesis: true location shift is not equal to 0

Longer chromosomes have a higher difference between genetic shuffling rates in bp/gene distances.

lmer.model = lmer((geneticshuffling_genedist - geneticshuffling) ~ log10(phys.map.length) + 
    (1 | species), data = chromosome.stats.shuffling)
summary(lmer.model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## (geneticshuffling_genedist - geneticshuffling) ~ log10(phys.map.length) +  
##     (1 | species)
##    Data: chromosome.stats.shuffling
## 
## REML criterion at convergence: -2032.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5352 -0.5316  0.0722  0.5623  3.6205 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  species  (Intercept) 0.0018671 0.04321 
##  Residual             0.0003497 0.01870 
## Number of obs: 430, groups:  species, 40
## 
## Fixed effects:
##                          Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)             -0.045069   0.016333 189.910554  -2.759  0.00636 ** 
## log10(phys.map.length)   0.055999   0.008915 298.406164   6.281 1.19e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg10(phy..) -0.905
r.squaredGLMM(lmer.model)
##           R2m       R2c
## [1,] 0.209811 0.8753577
(ref:shuffling-differencesshufflingrates)

Figure 6.6: (ref:shuffling-differencesshufflingrates)

6.3 Convergence among patterns?

Is there convergence between patterns, i.e. between telomere patterns, gene patterns, gene distance maps and genetic shuffling?

pattern_convergence = telomere_pattern
# For each species, are gene distance maps more/less/similarly homogeneous than
# genomic distances?  For each chromosome, compute the RMSE for gene distances
# and genomic distances RMSE = sqrt(sum((Xobs - Xpred)^2))
chromosome.stats.shuffling$RMSEgenomic = NA
chromosome.stats.shuffling$RMSEgene = NA
for (i in 1:nrow(chromosome.stats.shuffling)) {
    map = read.table(paste(wd, "data-cleaned/genome/gene_distance/", chromosome.stats.shuffling$set[i], 
        ".txt.gz", sep = ""), header = TRUE)
    map = map[which(map$map == chromosome.stats.shuffling$chromosome[i]), ]
    # Relative distances
    map$phys = map$phys/max(map$phys, na.rm = TRUE)
    map$gene_distance = map$gene_distance/max(map$gene_distance, na.rm = TRUE)
    # Estimate predicted (i.e. diagonal line) The formula for a strictly linear line
    # is y = 0 + (max(y)/1)*x Because x is a relative distance (O to 1) So slope
    # is...
    slope = max(map$gen, na.rm = TRUE)
    # And predicted...
    map$predictedGenomic = slope * map$phys
    map$predictedGene = slope * map$gene_distance
    # Compute RMSE
    chromosome.stats.shuffling$RMSEgenomic[i] = sqrt(sum((map$gen - map$predictedGenomic)^2))
    chromosome.stats.shuffling$RMSEgene[i] = sqrt(sum((map$gen - map$predictedGene)^2))
    rm(map)
}
chromosome.stats.shuffling$diffRMSE = chromosome.stats.shuffling$RMSEgene - chromosome.stats.shuffling$RMSEgenomic

# for each species, estimate the mean difference between RMSEs And classify as
# less/more homogeneous (>0/<0)
pattern_convergence$diffRMSE = NA
pattern_convergence$genedist_pattern = NA
for (i in 1:nrow(pattern_convergence)) {
    pattern_convergence$diffRMSE[i] = mean(chromosome.stats.shuffling$diffRMSE[which(chromosome.stats.shuffling$species == 
        pattern_convergence$species[i])], na.rm = TRUE)

    if (pattern_convergence$diffRMSE[i] == "NaN") {
        pattern_convergence$diffRMSE[i] = NA
    } else {
        if (pattern_convergence$diffRMSE[i] > 0) {
            pattern_convergence$genedist_pattern[i] = "less"
        } else {
            if (pattern_convergence$diffRMSE[i] < 0) {
                pattern_convergence$genedist_pattern[i] = "more"
            }
        }
    }

}


# For each species, are differences in genetic shuffling positive (increased) or
# negative (decreased)?  geneticshuffling_genedist - geneticshuffling
pattern_convergence$diffgeneticshuffl = NA
for (i in 1:nrow(pattern_convergence)) {
    pattern_convergence$diffgeneticshuffl[i] = mean(chromosome.stats.shuffling$geneticshuffling_genedist[which(chromosome.stats.shuffling$species == 
        pattern_convergence$species[i])] - chromosome.stats.shuffling$geneticshuffling[which(chromosome.stats.shuffling$species == 
        pattern_convergence$species[i])], na.rm = TRUE)
    if (pattern_convergence$diffgeneticshuffl[i] == "NaN") {
        pattern_convergence$diffgeneticshuffl[i] = NA
    }
}

# Mean chromosome size
pattern_convergence$meanchrsize = NA
for (i in 1:nrow(pattern_convergence)) {
    pattern_convergence$meanchrsize[i] = mean(chromosome.stats$phys.map.length[which(chromosome.stats$species == 
        pattern_convergence$species[i])], na.rm = TRUE)
}
# Number of maps classified as more homogeneous in gene distances:
sum(chromosome.stats.shuffling$diffRMSE < 0, na.rm = TRUE)
## [1] 303
# Proportion:
sum(chromosome.stats.shuffling$diffRMSE < 0, na.rm = TRUE)/sum(!is.na(chromosome.stats.shuffling$diffRMSE), 
    na.rm = TRUE)
## [1] 0.7481481
# Total:
sum(!is.na(chromosome.stats.shuffling$diffRMSE), na.rm = TRUE)
## [1] 405
# Cross-tables for checking convergence
table(pattern_convergence$gene_pattern, pattern_convergence$pattern)
##            
##             exception peak telomere
##   exception         3    1        1
##   peak              0    5        2
##   telomere          1    6       22
# How many species more and less?
table(pattern_convergence$genedist_pattern)
## 
## less more 
##    9   30
# Interaction telomere pattern and gene distance
table(pattern_convergence$gene_pattern, pattern_convergence$genedist_pattern)
##            
##             less more
##   exception    3    2
##   peak         4    3
##   telomere     2   25
# Species that changed of pattern among recombination/genes
pattern_convergence$species[which(pattern_convergence$pattern != pattern_convergence$gene_pattern)]
##  [1] "Brassica napus"     "Dioscorea alata"    "Eucalyptus grandis"
##  [4] "Malus domestica"    "Mangifera indica"   "Manihot esculenta" 
##  [7] "Panicum hallii"     "Prunus persica"     "Sesamum indicum"   
## [10] "Sorghum bicolor"    "Vitis vinifera"
# Cross-tables for checking convergence
table(pattern_convergence$gene_pattern)
## 
## exception      peak  telomere 
##         5         7        29
table(pattern_convergence$genedist_pattern)
## 
## less more 
##    9   30
chisq.test(table(pattern_convergence$genedist_pattern, pattern_convergence$gene_pattern))
## Warning in chisq.test(table(pattern_convergence$genedist_pattern,
## pattern_convergence$gene_pattern)): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  table(pattern_convergence$genedist_pattern, pattern_convergence$gene_pattern)
## X-squared = 12.151, df = 2, p-value = 0.002299
levels(pattern_convergence$pattern) = c("exception", "sub-telomere", "telomere")

knitr::kable(pattern_convergence, label = c("Species", "Recombination pattern", "Gene density pattern", 
    "RMSE difference", "Homogeneization effect of gene distance", "Genetic shuffling difference", 
    "Mean chromosome size"), caption = "genelandscapes-patterns", align = "c")
(#tab:SpeciesRecombination patternGene density patternRMSE differenceHomogeneization effect of gene distanceGenetic shuffling differenceMean chromosome size)genelandscapes-patterns
species pattern gene_pattern diffRMSE genedist_pattern diffgeneticshuffl meanchrsize
Aegilops speltoides telomere NA NA NA NA 517.44324
Arabidopsis thaliana exception exception 30.5745792 less 0.0064004 23.82927
Arachis duranensis telomere telomere NA NA 0.0579436 100.95097
Arachis hypogaea telomere telomere -614.0625554 more 0.2072079 124.68497
Boechera stricta sub-telomere NA NA NA NA 26.52555
Brachypodium distachyon telomere telomere -132.9239396 more 0.0320681 45.62172
Brassica napus telomere exception -24.6280031 more 0.0224256 44.71564
Brassica rapa sub-telomere peak -64.6465361 more 0.0371777 26.14066
Camelina sativa telomere telomere -21.9977716 more 0.0098442 30.63062
Camellia sinensis exception exception -0.7685128 more 0.0128694 222.10960
Capsella rubella exception exception 28.7411007 less -0.0051357 15.83906
Capsicum annuum telomere NA NA NA NA 227.12175
Cenchrus americanus telomere NA NA NA NA 223.50536
Citrullus lanatus telomere NA NA NA NA 33.59728
Citrus sinensis telomere NA NA NA NA 26.02017
Coffea canephora telomere NA NA NA NA 53.00321
Cucumis melo telomere telomere -76.2766136 more 0.0580861 30.99121
Cucumis sativus sub-telomere peak 17.0914585 less 0.0256670 27.40843
Cucurbita maxima sub-telomere peak 123.9720248 less 0.0020601 10.34886
Cucurbita pepo sub-telomere peak -26.7803739 more 0.0096076 10.78700
Dioscorea alata telomere peak 5.6181083 less 0.0059156 23.84612
Draba nivalis sub-telomere NA NA NA NA 34.11541
Elaeis guineensis sub-telomere peak 58.4634143 less -0.0004549 46.94000
Eucalyptus grandis exception telomere -52.1347009 more 0.0159960 55.68998
Glycine max telomere telomere -153.0590573 more 0.0691302 48.12719
Gossypium hirsutum telomere telomere -92.0853380 more 0.0647436 85.92276
Gossypium raimondii telomere telomere -894.6366733 more 0.0720796 57.63293
Helianthus annuus telomere telomere -92.8265579 more 0.0291271 169.70400
Hordeum vulgare telomere telomere -462.9124143 more 0.1904018 654.85949
Juglans regia sub-telomere NA NA NA NA 47.43156
Lupinus albus telomere telomere -21.5359754 more 0.0291837 17.35603
Lupinus angustifolius telomere NA NA NA NA 27.89545
Malus domestica sub-telomere telomere -162.8282246 more 0.0206064 38.38249
Mangifera indica sub-telomere telomere NA NA NA 17.87202
Manihot esculenta sub-telomere telomere 486.9496847 less -0.0099806 28.80229
Momordica charantia sub-telomere NA NA NA NA 26.50729
Nelumbo nucifera exception NA NA NA NA 97.84551
Oryza nivara telomere telomere -17.1467421 more 0.0138083 28.16253
Oryza sativa telomere telomere -132.8990325 more 0.0236937 31.10379
Panicum hallii sub-telomere telomere -1922.1715758 more 0.0865313 56.37928
Phaseolus vulgaris telomere telomere -189.3509884 more 0.0634829 46.80187
Prunus mume telomere telomere 0.7466590 less 0.0037906 24.98833
Prunus persica sub-telomere telomere -62.8750625 more 0.0135736 28.21185
Quercus sp exception NA NA NA NA 59.72765
Raphanus sativus exception NA NA NA NA 38.78895
Sesamum indicum sub-telomere exception 12.6191473 less 0.0113517 16.40985
Setaria italica telomere telomere -115.0067495 more 0.0376572 44.58849
Solanum lycopersicum telomere telomere -289.8449953 more 0.1352258 67.26872
Solanum tuberosum telomere telomere -101.5698020 more 0.0529184 60.41812
Sorghum bicolor sub-telomere telomere -113.0771627 more 0.1995367 68.36450
Theobroma cacao telomere telomere -189.3678093 more 0.0415695 33.04562
Triticum aestivum telomere telomere -508.7577718 more 0.0794583 667.82536
Triticum dicoccoides telomere NA NA NA NA 582.40116
Triticum urartu telomere NA NA NA NA 665.94253
Vigna unguiculata telomere telomere -354.5383861 more 0.0511767 43.04148
Vitis vinifera telomere peak -82.5057959 more 0.0211167 22.43032
Zea mays telomere telomere -372.1453979 more 0.1188828 210.63381
df = pattern_convergence
colnames(df) = c("Species", "Recombination pattern", "Gene density pattern", "RMSE difference", 
    "Homogeneization effect of gene distance", "Genetic shuffling difference", "Mean chromosome size")
write.xlsx(x = df, file = paste(wd, "tables/article_one_supp/tables_supplementary.xls", 
    sep = ""), sheetName = "S11_ConvergencePatterns", row.names = FALSE, append = TRUE)
rm(df)
ggplot(data = pattern_convergence, aes(x = pattern, y = diffgeneticshuffl)) + geom_violin(aes(colour = pattern, 
    fill = pattern)) + geom_dotplot(aes(), binaxis = "y", stackdir = "center", dotsize = 0.5, 
    fill = 1) + scale_color_manual(values = c("black", colorpalette[2], colorpalette[1])) + 
    scale_fill_manual(values = alpha(c("black", colorpalette[2], colorpalette[1]), 
        0.5)) + xlab("Crossover pattern") + ylab("Difference in genetic shuffling") + 
    theme(axis.line = element_line(colour = "black"), panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(), 
        plot.title = element_text(color = "black", size = 14, face = "bold.italic", 
            hjust = 0.5), axis.title.x = element_text(color = "black", size = 14), 
        axis.title.y = element_text(color = "black", size = 14), axis.text = element_text(size = 14, 
            colour = "black"), legend.key = element_rect(fill = "white", size = 1), 
        legend.key.height = unit(1, "line"), legend.key.width = unit(2, "line"), 
        legend.text = element_text(size = 7, face = "italic"), legend.title = element_blank(), 
        legend.position = "none")
## Warning: Removed 17 rows containing non-finite values (stat_ydensity).
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 17 rows containing non-finite values (stat_bindot).

# plot(diffgeneticshuffl ~ log(meanchrsize), data = pattern_convergence)
## 
## Call:
## lm(formula = diff ~ chrsize, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.07294 -0.02894 -0.01314  0.01741  0.15455 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.274e-02  8.936e-03   3.663 0.000756 ***
## chrsize     1.792e-04  5.424e-05   3.303 0.002089 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04847 on 38 degrees of freedom
## Multiple R-squared:  0.2231, Adjusted R-squared:  0.2026 
## F-statistic: 10.91 on 1 and 38 DF,  p-value: 0.002089
(ref:differences-shufflingrates)

Figure 6.7: (ref:differences-shufflingrates)

(ref:dist-shufflingrates) (ref:dist-shufflingrates)

6.3.0.1 Figure 11

7 Packages & Session Info

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] egg_0.4.5           ggrepel_0.9.1       ggtext_0.1.1       
##  [4] magick_2.7.2        gridExtra_2.3       xlsx_0.6.5         
##  [7] data.table_1.14.0   ggnewscale_0.4.5    treeio_1.14.3      
## [10] ggtree_2.4.1        MuMIn_1.43.17       pbmcapply_1.5.0    
## [13] reldist_1.6-6       lmerTest_3.1-3      plyr_1.8.6         
## [16] ape_5.4-1           car_3.0-10          carData_3.0-4      
## [19] kableExtra_1.3.4    phyr_1.1.0          INLA_21.02.23      
## [22] sjmisc_2.8.6        sjPlot_2.8.7        lme4_1.1-26        
## [25] Matrix_1.3-2        cowplot_1.1.1       pals_1.6           
## [28] ggplotify_0.0.5     RColorBrewer_1.1-2  reshape2_1.4.4     
## [31] ggpubr_0.4.0        purrr_0.3.4         rBLAST_0.99.2      
## [34] Biostrings_2.58.0   XVector_0.30.0      IRanges_2.24.1     
## [37] S4Vectors_0.28.1    BiocGenerics_0.36.0 rentrez_1.2.3      
## [40] dplyr_1.0.5         taxizedb_0.3.0      taxize_0.9.99      
## [43] gdata_2.18.0        stringr_1.4.0       MareyMap_1.3.6     
## [46] ggplot2_3.3.3       ade4_1.7-16         rstudioapi_0.13    
## [49] bookdown_0.21       knitr_1.38         
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.1          tidyselect_1.1.0    htmlwidgets_1.5.3  
##   [4] RSQLite_2.2.4       grid_4.0.4          hoardr_0.5.2       
##   [7] munsell_0.5.0       codetools_0.2-18    effectsize_0.4.4   
##  [10] statmod_1.4.35      withr_2.4.1         colorspace_2.0-0   
##  [13] highr_0.8           uuid_0.1-4          ggsignif_0.6.1     
##  [16] rJava_0.9-13        labeling_0.4.2      emmeans_1.5.4      
##  [19] conditionz_0.1.0    farver_2.1.0        bit64_4.0.5        
##  [22] coda_0.19-4         vctrs_0.3.8         generics_0.1.0     
##  [25] xfun_0.30           markdown_1.1        R6_2.5.0           
##  [28] cachem_1.0.4        reshape_0.8.8       gridGraphics_0.5-1 
##  [31] assertthat_0.2.1    scales_1.1.1        nnet_7.3-15        
##  [34] gtable_0.3.0        bold_1.1.0          rlang_1.0.2        
##  [37] systemfonts_1.0.1   splines_4.0.4       lazyeval_0.2.2     
##  [40] rstatix_0.7.0       dichromat_2.0-0     checkmate_2.0.0    
##  [43] broom_0.7.12        BiocManager_1.30.10 yaml_2.2.1         
##  [46] abind_1.4-5         modelr_0.1.8        backports_1.2.1    
##  [49] Hmisc_4.5-0         gridtext_0.1.4      tools_4.0.4        
##  [52] tcltk_4.0.4         ellipsis_0.3.2      jquerylib_0.1.3    
##  [55] Rcpp_1.0.6          base64enc_0.1-3     progress_1.2.2     
##  [58] zlibbioc_1.36.0     prettyunits_1.1.1   rpart_4.1-15       
##  [61] zoo_1.8-9           haven_2.3.1         cluster_2.1.1      
##  [64] crul_1.1.0          magrittr_2.0.1      openxlsx_4.2.3     
##  [67] mvtnorm_1.1-1       xlsxjars_0.6.1      patchwork_1.1.1    
##  [70] hms_1.0.0           evaluate_0.15       xtable_1.8-4       
##  [73] XML_3.99-0.5        rio_0.5.26          sjstats_0.18.1     
##  [76] jpeg_0.1-8.1        readxl_1.3.1        ggeffects_1.0.1    
##  [79] compiler_4.0.4      tibble_3.1.0        maps_3.3.0         
##  [82] crayon_1.4.1        minqa_1.2.4         htmltools_0.5.1.1  
##  [85] mgcv_1.8-34         Formula_1.2-4       aplot_0.0.6        
##  [88] tidyr_1.1.3         DBI_1.1.1           formatR_1.8        
##  [91] sjlabelled_1.1.7    dbplyr_2.1.1        MASS_7.3-53.1      
##  [94] rappdirs_0.3.3      boot_1.3-27         cli_3.2.0          
##  [97] insight_0.13.1      forcats_0.5.1       pkgconfig_2.0.3    
## [100] rvcheck_0.1.8       numDeriv_2016.8-1.1 foreign_0.8-81     
## [103] sp_1.4-5            xml2_1.3.3          foreach_1.5.1      
## [106] svglite_2.0.0       bslib_0.2.4         webshot_0.5.2      
## [109] estimability_1.3    rvest_1.0.0         digest_0.6.27      
## [112] parameters_0.12.0   httpcode_0.3.0      rmarkdown_2.7      
## [115] cellranger_1.1.0    tidytree_0.3.3      htmlTable_2.1.0    
## [118] curl_4.3            gtools_3.8.2        nloptr_1.2.2.2     
## [121] lifecycle_1.0.0     nlme_3.1-152        jsonlite_1.7.2     
## [124] mapproj_1.2.7       viridisLite_0.3.0   fansi_0.4.2        
## [127] pillar_1.7.0        lattice_0.20-41     fastmap_1.1.0      
## [130] httr_1.4.2          survival_3.2-9      glue_1.6.2         
## [133] bayestestR_0.8.2    zip_2.1.1           png_0.1-7          
## [136] iterators_1.0.13    bit_4.0.4           stringi_1.7.6      
## [139] sass_0.3.1          performance_0.7.0   blob_1.2.1         
## [142] latticeExtra_0.6-29 memoise_2.0.0
# Citations for libraries
lapply(packages, function(x) citation(x))
## [[1]]
## 
## To cite package 'rstudioapi' in publications use:
## 
##   Kevin Ushey, JJ Allaire, Hadley Wickham and Gary Ritchie (2020).
##   rstudioapi: Safely Access the RStudio API. R package version 0.13.
##   https://CRAN.R-project.org/package=rstudioapi
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rstudioapi: Safely Access the RStudio API},
##     author = {Kevin Ushey and JJ Allaire and Hadley Wickham and Gary Ritchie},
##     year = {2020},
##     note = {R package version 0.13},
##     url = {https://CRAN.R-project.org/package=rstudioapi},
##   }
## 
## 
## [[2]]
## 
## To cite ade4 in publications use:
## 
## Dray S, Dufour A (2007). "The ade4 Package: Implementing the Duality
## Diagram for Ecologists." _Journal of Statistical Software_, *22*(4),
## 1-20. doi: 10.18637/jss.v022.i04 (URL:
## https://doi.org/10.18637/jss.v022.i04).
## 
## Bougeard S, Dray S (2018). "Supervised Multiblock Analysis in R with
## the ade4 Package." _Journal of Statistical Software_, *86*(1), 1-17.
## doi: 10.18637/jss.v086.i01 (URL:
## https://doi.org/10.18637/jss.v086.i01).
## 
## Chessel D, Dufour A, Thioulouse J (2004). "The ade4 Package - I:
## One-Table Methods." _R News_, *4*(1), 5-10. <URL:
## https://cran.r-project.org/doc/Rnews/>.
## 
## Dray S, Dufour A, Chessel D (2007). "The ade4 Package - II: Two-Table
## and K-Table Methods." _R News_, *7*(2), 47-52. <URL:
## https://cran.r-project.org/doc/Rnews/>.
## 
## Thioulouse J, Dray S, Dufour A, Siberchicot A, Jombart T, Pavoine S
## (2018). _Multivariate Analysis of Ecological Data with ade4_. Springer.
## doi: 10.1007/978-1-4939-8850-1 (URL:
## https://doi.org/10.1007/978-1-4939-8850-1).
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
## 
## 
## [[3]]
## 
## To cite ggplot2 in publications, please use:
## 
##   H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
##   Springer-Verlag New York, 2016.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     author = {Hadley Wickham},
##     title = {ggplot2: Elegant Graphics for Data Analysis},
##     publisher = {Springer-Verlag New York},
##     year = {2016},
##     isbn = {978-3-319-24277-4},
##     url = {https://ggplot2.tidyverse.org},
##   }
## 
## 
## [[4]]
## 
## To cite package 'MareyMap' in publications use:
## 
##   Aurélie Siberchicot, Clément Rezvoy, Delphine Charif, Laurent Gueguen
##   and Gabriel Marais (2020). MareyMap: Estimation of Meiotic
##   Recombination Rates Using Marey Maps. R package version 1.3.6.
##   https://CRAN.R-project.org/package=MareyMap
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {MareyMap: Estimation of Meiotic Recombination Rates Using Marey Maps},
##     author = {Aurélie Siberchicot and Clément Rezvoy and Delphine Charif and Laurent Gueguen and Gabriel Marais},
##     year = {2020},
##     note = {R package version 1.3.6},
##     url = {https://CRAN.R-project.org/package=MareyMap},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[5]]
## 
## To cite package 'stringr' in publications use:
## 
##   Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for
##   Common String Operations. R package version 1.4.0.
##   https://CRAN.R-project.org/package=stringr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {stringr: Simple, Consistent Wrappers for Common String Operations},
##     author = {Hadley Wickham},
##     year = {2019},
##     note = {R package version 1.4.0},
##     url = {https://CRAN.R-project.org/package=stringr},
##   }
## 
## 
## [[6]]
## 
## To cite package 'gdata' in publications use:
## 
##   Gregory R. Warnes, Ben Bolker, Gregor Gorjanc, Gabor Grothendieck,
##   Ales Korosec, Thomas Lumley, Don MacQueen, Arni Magnusson, Jim Rogers
##   and others (2017). gdata: Various R Programming Tools for Data
##   Manipulation. R package version 2.18.0.
##   https://CRAN.R-project.org/package=gdata
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {gdata: Various R Programming Tools for Data Manipulation},
##     author = {Gregory R. Warnes and Ben Bolker and Gregor Gorjanc and Gabor Grothendieck and Ales Korosec and Thomas Lumley and Don MacQueen and Arni Magnusson and Jim Rogers and others},
##     year = {2017},
##     note = {R package version 2.18.0},
##     url = {https://CRAN.R-project.org/package=gdata},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[7]]
## 
## To cite taxize in publications use at least the first, if not both:
## 
##   Scott Chamberlain and Eduard Szocs (2013). taxize - taxonomic search
##   and retrieval in R. F1000Research, 2:191. URL:
##   https://f1000research.com/articles/2-191/v2
## 
##   Scott Chamberlain, Eduard Szoecs, Zachary Foster, Zebulun Arendsee,
##   Carl Boettiger, Karthik Ram, Ignasi Bartomeus, John Baumgartner,
##   James O'Donnell, Jari Oksanen, Bastian Greshake Tzovaras, Philippe
##   Marchand, Vinh Tran, Maëlle Salmon, Gaopeng Li, and Matthias Grenié.
##   (2020) taxize: Taxonomic information from around the web. R package
##   version 0.9.98. https://github.com/ropensci/taxize
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
## 
## 
## [[8]]
## 
## To cite package 'taxizedb' in publications use:
## 
##   Scott Chamberlain and Zebulun Arendsee (2021). taxizedb: Tools for
##   Working with 'Taxonomic' Databases. R package version 0.3.0.
##   https://CRAN.R-project.org/package=taxizedb
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {taxizedb: Tools for Working with 'Taxonomic' Databases},
##     author = {Scott Chamberlain and Zebulun Arendsee},
##     year = {2021},
##     note = {R package version 0.3.0},
##     url = {https://CRAN.R-project.org/package=taxizedb},
##   }
## 
## 
## [[9]]
## 
## To cite package 'dplyr' in publications use:
## 
##   Hadley Wickham, Romain François, Lionel Henry and Kirill Müller
##   (2021). dplyr: A Grammar of Data Manipulation. R package version
##   1.0.5. https://CRAN.R-project.org/package=dplyr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {dplyr: A Grammar of Data Manipulation},
##     author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
##     year = {2021},
##     note = {R package version 1.0.5},
##     url = {https://CRAN.R-project.org/package=dplyr},
##   }
## 
## 
## [[10]]
## 
## To cite rentrez in publications use:
## 
##   Winter, D. J. (2017) rentrez: an R package for the NCBI eUtils API
##   The R Journal 9(2):520-526
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {{rentrez}: an R package for the NCBI eUtils API},
##     author = {David J. Winter},
##     journal = {The R Journal},
##     year = {2017},
##     volume = {9},
##     issue = {2},
##     pages = {520--526},
##   }
## 
## 
## [[11]]
## 
## To cite package 'rBLAST' in publications use:
## 
##   Michael Hahsler and Anurag Nagar (2019). rBLAST: Interface to the
##   Basic Local Alignment Search Tool (BLAST). R package version 0.99.2.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rBLAST: Interface to the Basic Local Alignment Search Tool (BLAST)},
##     author = {Michael Hahsler and Anurag Nagar},
##     year = {2019},
##     note = {R package version 0.99.2},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[12]]
## 
## To cite package 'Biostrings' in publications use:
## 
##   H. Pagès, P. Aboyoun, R. Gentleman and S. DebRoy (2021). Biostrings:
##   Efficient manipulation of biological strings. R package version
##   2.58.0. https://bioconductor.org/packages/Biostrings
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {Biostrings: Efficient manipulation of biological strings},
##     author = {H. Pagès and P. Aboyoun and R. Gentleman and S. DebRoy},
##     year = {2021},
##     note = {R package version 2.58.0},
##     url = {https://bioconductor.org/packages/Biostrings},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[13]]
## 
## To cite package 'purrr' in publications use:
## 
##   Lionel Henry and Hadley Wickham (2020). purrr: Functional Programming
##   Tools. R package version 0.3.4.
##   https://CRAN.R-project.org/package=purrr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {purrr: Functional Programming Tools},
##     author = {Lionel Henry and Hadley Wickham},
##     year = {2020},
##     note = {R package version 0.3.4},
##     url = {https://CRAN.R-project.org/package=purrr},
##   }
## 
## 
## [[14]]
## 
## To cite package 'ggpubr' in publications use:
## 
##   Alboukadel Kassambara (2020). ggpubr: 'ggplot2' Based Publication
##   Ready Plots. R package version 0.4.0.
##   https://CRAN.R-project.org/package=ggpubr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggpubr: 'ggplot2' Based Publication Ready Plots},
##     author = {Alboukadel Kassambara},
##     year = {2020},
##     note = {R package version 0.4.0},
##     url = {https://CRAN.R-project.org/package=ggpubr},
##   }
## 
## 
## [[15]]
## 
## To cite reshape2 in publications use:
## 
##   Hadley Wickham (2007). Reshaping Data with the reshape Package.
##   Journal of Statistical Software, 21(12), 1-20. URL
##   http://www.jstatsoft.org/v21/i12/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Reshaping Data with the {reshape} Package},
##     author = {Hadley Wickham},
##     journal = {Journal of Statistical Software},
##     year = {2007},
##     volume = {21},
##     number = {12},
##     pages = {1--20},
##     url = {http://www.jstatsoft.org/v21/i12/},
##   }
## 
## 
## [[16]]
## 
## To cite package 'RColorBrewer' in publications use:
## 
##   Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package
##   version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RColorBrewer: ColorBrewer Palettes},
##     author = {Erich Neuwirth},
##     year = {2014},
##     note = {R package version 1.1-2},
##     url = {https://CRAN.R-project.org/package=RColorBrewer},
##   }
## 
## 
## [[17]]
## 
## To cite package 'ggplotify' in publications use:
## 
##   Guangchuang Yu (2020). ggplotify: Convert Plot to 'grob' or 'ggplot'
##   Object. R package version 0.0.5.
##   https://CRAN.R-project.org/package=ggplotify
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggplotify: Convert Plot to 'grob' or 'ggplot' Object},
##     author = {Guangchuang Yu},
##     year = {2020},
##     note = {R package version 0.0.5},
##     url = {https://CRAN.R-project.org/package=ggplotify},
##   }
## 
## 
## [[18]]
## 
## To cite package 'pals' in publications use:
## 
##   Kevin Wright (2019). pals: Color Palettes, Colormaps, and Tools to
##   Evaluate Them. R package version 1.6.
##   https://CRAN.R-project.org/package=pals
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {pals: Color Palettes, Colormaps, and Tools to Evaluate Them},
##     author = {Kevin Wright},
##     year = {2019},
##     note = {R package version 1.6},
##     url = {https://CRAN.R-project.org/package=pals},
##   }
## 
## 
## [[19]]
## 
## To cite package 'cowplot' in publications use:
## 
##   Claus O. Wilke (2020). cowplot: Streamlined Plot Theme and Plot
##   Annotations for 'ggplot2'. R package version 1.1.1.
##   https://CRAN.R-project.org/package=cowplot
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'},
##     author = {Claus O. Wilke},
##     year = {2020},
##     note = {R package version 1.1.1},
##     url = {https://CRAN.R-project.org/package=cowplot},
##   }
## 
## 
## [[20]]
## 
## To cite lme4 in publications use:
## 
##   Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
##   Fitting Linear Mixed-Effects Models Using lme4. Journal of
##   Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Fitting Linear Mixed-Effects Models Using {lme4}},
##     author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
##     journal = {Journal of Statistical Software},
##     year = {2015},
##     volume = {67},
##     number = {1},
##     pages = {1--48},
##     doi = {10.18637/jss.v067.i01},
##   }
## 
## 
## [[21]]
## 
## Lüdecke D (2021). _sjPlot: Data Visualization for Statistics in Social
## Science_. R package version 2.8.7, <URL:
## https://CRAN.R-project.org/package=sjPlot>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {sjPlot: Data Visualization for Statistics in Social Science},
##     author = {Daniel Lüdecke},
##     year = {2021},
##     note = {R package version 2.8.7},
##     url = {https://CRAN.R-project.org/package=sjPlot},
##   }
## 
## 
## [[22]]
## 
## Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions."
## _Journal of Open Source Software_, *3*(26), 754. doi:
## 10.21105/joss.00754 (URL: https://doi.org/10.21105/joss.00754).
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {sjmisc: Data and Variable Transformation Functions.},
##     volume = {3},
##     doi = {10.21105/joss.00754},
##     number = {26},
##     journal = {Journal of Open Source Software},
##     author = {Daniel Lüdecke},
##     year = {2018},
##     pages = {754},
##   }
## 
## 
## [[23]]
## 
## To cite package 'phyr' in publications use:
## 
##   Anthony Ives, Russell Dinnage, Lucas A. Nell, Matthew Helmus and
##   Daijiang Li (2020). phyr: Model Based Phylogenetic Analysis. R
##   package version 1.1.0. https://CRAN.R-project.org/package=phyr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {phyr: Model Based Phylogenetic Analysis},
##     author = {Anthony Ives and Russell Dinnage and Lucas A. Nell and Matthew Helmus and Daijiang Li},
##     year = {2020},
##     note = {R package version 1.1.0},
##     url = {https://CRAN.R-project.org/package=phyr},
##   }
## 
## 
## [[24]]
## 
## To cite package 'kableExtra' in publications use:
## 
##   Hao Zhu (2021). kableExtra: Construct Complex Table with 'kable' and
##   Pipe Syntax. R package version 1.3.4.
##   https://CRAN.R-project.org/package=kableExtra
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax},
##     author = {Hao Zhu},
##     year = {2021},
##     note = {R package version 1.3.4},
##     url = {https://CRAN.R-project.org/package=kableExtra},
##   }
## 
## 
## [[25]]
## 
## To cite the car package in publications use:
## 
##   John Fox and Sanford Weisberg (2019). An {R} Companion to Applied
##   Regression, Third Edition. Thousand Oaks CA: Sage. URL:
##   https://socialsciences.mcmaster.ca/jfox/Books/Companion/
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     title = {An {R} Companion to Applied Regression},
##     edition = {Third},
##     author = {John Fox and Sanford Weisberg},
##     year = {2019},
##     publisher = {Sage},
##     address = {Thousand Oaks {CA}},
##     url = {https://socialsciences.mcmaster.ca/jfox/Books/Companion/},
##   }
## 
## 
## [[26]]
## 
## To cite ape in a publication please use:
## 
##   Paradis E. & Schliep K. 2019. ape 5.0: an environment for modern
##   phylogenetics and evolutionary analyses in R. Bioinformatics 35:
##   526-528.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {ape 5.0: an environment for modern phylogenetics and evolutionary analyses in {R}},
##     author = {E. Paradis and K. Schliep},
##     journal = {Bioinformatics},
##     year = {2019},
##     volume = {35},
##     pages = {526-528},
##   }
## 
## As ape is evolving quickly, you may want to cite also its version
## number (found with 'library(help = ape)' or 'packageVersion("ape")').
## 
## 
## [[27]]
## 
## To cite plyr in publications use:
## 
##   Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data
##   Analysis. Journal of Statistical Software, 40(1), 1-29. URL
##   http://www.jstatsoft.org/v40/i01/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {The Split-Apply-Combine Strategy for Data Analysis},
##     author = {Hadley Wickham},
##     journal = {Journal of Statistical Software},
##     year = {2011},
##     volume = {40},
##     number = {1},
##     pages = {1--29},
##     url = {http://www.jstatsoft.org/v40/i01/},
##   }
## 
## 
## [[28]]
## 
## To cite lmerTest in publications use:
## 
## Kuznetsova A, Brockhoff PB, Christensen RHB (2017). "lmerTest Package:
## Tests in Linear Mixed Effects Models." _Journal of Statistical
## Software_, *82*(13), 1-26. doi: 10.18637/jss.v082.i13 (URL:
## https://doi.org/10.18637/jss.v082.i13).
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {{lmerTest} Package: Tests in Linear Mixed Effects Models},
##     author = {Alexandra Kuznetsova and Per B. Brockhoff and Rune H. B. Christensen},
##     journal = {Journal of Statistical Software},
##     year = {2017},
##     volume = {82},
##     number = {13},
##     pages = {1--26},
##     doi = {10.18637/jss.v082.i13},
##   }
## 
## 
## [[29]]
## 
## If you are using ideas from the 'Relative Distribution Methods' book
## for research that will be published, we request that you acknowledge
## this by citing the book as shown below.
## 
## If you use this 'reldist' package, please also use the second citation
## below.
## 
## For BibTeX format, use toBibtex(citation("reldist")).
## 
##   Mark S. Handcock and Martina Morris (1999) Relative Distribution
##   Methods in the Social Sciences. Springer, New York, ISBN
##   0-387-98778-9. URL http://www.stat.ucla.edu/~handcock/RelDist
## 
##   Mark S. Handcock (2016), Relative Distribution Methods. Version
##   1.6-6. Project home page at
##   http://www.stat.ucla.edu/~handcock/RelDist. URL
##   https://CRAN.R-project.org/package=reldist.
## 
## We have invested a lot of time and effort in creating the 'reldist'
## package for use by other researchers. Please cite it in all papers
## where it is used.
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
## 
## 
## [[30]]
## 
## To cite package 'pbmcapply' in publications use:
## 
##   Kevin Kuang, Quyu Kong and Francesco Napolitano (2019). pbmcapply:
##   Tracking the Progress of Mc*pply with Progress Bar. R package version
##   1.5.0. https://CRAN.R-project.org/package=pbmcapply
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {pbmcapply: Tracking the Progress of Mc*pply with Progress Bar},
##     author = {Kevin Kuang and Quyu Kong and Francesco Napolitano},
##     year = {2019},
##     note = {R package version 1.5.0},
##     url = {https://CRAN.R-project.org/package=pbmcapply},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[31]]
## 
## To cite package 'MuMIn' in publications use:
## 
##   Kamil Bartoń (2020). MuMIn: Multi-Model Inference. R package version
##   1.43.17. https://CRAN.R-project.org/package=MuMIn
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {MuMIn: Multi-Model Inference},
##     author = {Kamil Bartoń},
##     year = {2020},
##     note = {R package version 1.43.17},
##     url = {https://CRAN.R-project.org/package=MuMIn},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## 
## [[32]]
## 
## To cite ggtree in publications use:
## 
##   Guangchuang Yu. Using ggtree to visualize data on tree-like
##   structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi:
##   10.1002/cpbi.96
## 
##   Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods
##   for mapping and visualizing associated data on phylogeny using
##   ggtree. Molecular Biology and Evolution 2018, 35(2):3041-3043. doi:
##   10.1093/molbev/msy194
## 
##   Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk
##   Lam. ggtree: an R package for visualization and annotation of
##   phylogenetic trees with their covariates and other associated data.
##   Methods in Ecology and Evolution 2017, 8(1):28-36.
##   doi:10.1111/2041-210X.12628
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
## 
## 
## [[33]]
## 
## To cite treeio in publications use:
## 
##   LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR
##   Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package
##   for phylogenetic tree input and output with richly annotated and
##   associated data. Molecular Biology and Evolution 2020, 37(2):599-603.
##   doi: 10.1093/molbev/msz240
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {treeio: an R package for phylogenetic tree input and output with richly annotated and associated data.},
##     author = {Li-Gen Wang and Tommy Tsan-Yuk Lam and Shuangbin Xu and Zehan Dai and Lang Zhou and Tingze Feng and Pingfan Guo and Casey W. Dunn and Bradley R. Jones and Tyler Bradley and Huachen Zhu and Yi Guan and Yong Jiang and Guangchuang Yu},
##     year = {2020},
##     journal = {Molecular Biology and Evolution},
##     volume = {37},
##     issue = {2},
##     pages = {599-603},
##     doi = {10.1093/molbev/msz240},
##   }
## 
## 
## [[34]]
## 
## To cite package 'ggnewscale' in publications use:
## 
##   Elio Campitelli (2021). ggnewscale: Multiple Fill and Colour Scales
##   in 'ggplot2'. R package version 0.4.5.
##   https://CRAN.R-project.org/package=ggnewscale
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggnewscale: Multiple Fill and Colour Scales in 'ggplot2'},
##     author = {Elio Campitelli},
##     year = {2021},
##     note = {R package version 0.4.5},
##     url = {https://CRAN.R-project.org/package=ggnewscale},
##   }
## 
## 
## [[35]]
## 
## To cite package 'dplyr' in publications use:
## 
##   Hadley Wickham, Romain François, Lionel Henry and Kirill Müller
##   (2021). dplyr: A Grammar of Data Manipulation. R package version
##   1.0.5. https://CRAN.R-project.org/package=dplyr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {dplyr: A Grammar of Data Manipulation},
##     author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
##     year = {2021},
##     note = {R package version 1.0.5},
##     url = {https://CRAN.R-project.org/package=dplyr},
##   }
## 
## 
## [[36]]
## 
## To cite package 'data.table' in publications use:
## 
##   Matt Dowle and Arun Srinivasan (2021). data.table: Extension of
##   `data.frame`. R package version 1.14.0.
##   https://CRAN.R-project.org/package=data.table
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {data.table: Extension of `data.frame`},
##     author = {Matt Dowle and Arun Srinivasan},
##     year = {2021},
##     note = {R package version 1.14.0},
##     url = {https://CRAN.R-project.org/package=data.table},
##   }
## 
## 
## [[37]]
## 
## To cite package 'xlsx' in publications use:
## 
##   Adrian Dragulescu and Cole Arendt (2020). xlsx: Read, Write, Format
##   Excel 2007 and Excel 97/2000/XP/2003 Files. R package version 0.6.5.
##   https://CRAN.R-project.org/package=xlsx
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {xlsx: Read, Write, Format Excel 2007 and Excel 97/2000/XP/2003 Files},
##     author = {Adrian Dragulescu and Cole Arendt},
##     year = {2020},
##     note = {R package version 0.6.5},
##     url = {https://CRAN.R-project.org/package=xlsx},
##   }
## 
## 
## [[38]]
## 
## To cite package 'gridExtra' in publications use:
## 
##   Baptiste Auguie (2017). gridExtra: Miscellaneous Functions for "Grid"
##   Graphics. R package version 2.3.
##   https://CRAN.R-project.org/package=gridExtra
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {gridExtra: Miscellaneous Functions for "Grid" Graphics},
##     author = {Baptiste Auguie},
##     year = {2017},
##     note = {R package version 2.3},
##     url = {https://CRAN.R-project.org/package=gridExtra},
##   }
## 
## 
## [[39]]
## 
## To cite package 'magick' in publications use:
## 
##   Jeroen Ooms (2021). magick: Advanced Graphics and Image-Processing in
##   R. R package version 2.7.2. https://CRAN.R-project.org/package=magick
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {magick: Advanced Graphics and Image-Processing in R},
##     author = {Jeroen Ooms},
##     year = {2021},
##     note = {R package version 2.7.2},
##     url = {https://CRAN.R-project.org/package=magick},
##   }
## 
## 
## [[40]]
## 
## To cite package 'ggtext' in publications use:
## 
##   Claus O. Wilke (2020). ggtext: Improved Text Rendering Support for
##   'ggplot2'. R package version 0.1.1.
##   https://CRAN.R-project.org/package=ggtext
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggtext: Improved Text Rendering Support for 'ggplot2'},
##     author = {Claus O. Wilke},
##     year = {2020},
##     note = {R package version 0.1.1},
##     url = {https://CRAN.R-project.org/package=ggtext},
##   }
## 
## 
## [[41]]
## 
## To cite package 'ggrepel' in publications use:
## 
##   Kamil Slowikowski (2021). ggrepel: Automatically Position
##   Non-Overlapping Text Labels with 'ggplot2'. R package version 0.9.1.
##   https://CRAN.R-project.org/package=ggrepel
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggrepel: Automatically Position Non-Overlapping Text Labels with
## 'ggplot2'},
##     author = {Kamil Slowikowski},
##     year = {2021},
##     note = {R package version 0.9.1},
##     url = {https://CRAN.R-project.org/package=ggrepel},
##   }
## 
## 
## [[42]]
## 
## To cite package 'egg' in publications use:
## 
##   Baptiste Auguie (2019). egg: Extensions for 'ggplot2': Custom Geom,
##   Custom Themes, Plot Alignment, Labelled Panels, Symmetric Scales, and
##   Fixed Panel Size. R package version 0.4.5.
##   https://CRAN.R-project.org/package=egg
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {egg: Extensions for 'ggplot2': Custom Geom, Custom Themes, Plot
## Alignment, Labelled Panels, Symmetric Scales, and Fixed Panel
## Size},
##     author = {Baptiste Auguie},
##     year = {2019},
##     note = {R package version 0.4.5},
##     url = {https://CRAN.R-project.org/package=egg},
##   }

8 References